
OPERATIONAL DEFECT DATABASE
...

...
After configuring 100 IPSLA ICMP probes with 1s frequency, tracking continously flap (UP, DOWN) raw_ip process crashes: RP/0/RP0/CPU0:2019 Sep 25 11:01:58.801 : resmon[431]: %HA-HA_WD-4-FD_MIN_LIM : Process raw_ip, pid 9561 is using 48626 file descriptors out of a max of 54800. Minor Limit (80%) crossed. RP/0/RP0/CPU0:2019 Sep 25 11:02:13.838 :resmon[431]: %HA-HA_WD-3-FD_CRIT_LIM : Process raw_ip, pid 9561 is using 53702 file descriptors out of a max of 54800. Critical limit (95%) crossed. RP/0/RP0/CPU0:2019 Sep 25 11:02:14.017 : /pkg/sbin/sysmgr_log[69457]: %OS-SYSMGR-7-CHECK_LOG : /pkg/bin/resmon_debug_fd_script invoked by pid 2842 (resmon) for pid 9561, jid 241. Output is in /misc/scratch//resmon_debug/debug_filedesc.node0_RP0_CPU0.9561.log RP/0/RP0/CPU0:2019 Sep 25 11:02:32.052 : ipsla_sa[219]: %MGBL-IPSLA-5-INTERNAL : IOS_MSG_SOCKET_EVENT cannot attach events:448:'ipsla' detected the 'resource not available' condition 'Error on socket api': Transport endpoint is not con
100 IPSLA ICMP probes configured. Agressive frequency (1s)
Configure less IPSLA ICMP probes, or use less agressive frequency
Watchdog (WD) detects something is wrong with raw_ip process: RP/0/RP0/CPU0:NCS5500# sh watchdog trace | i raw Wed Sep 25 11:55:14.383 CEST Sep 25 03:41:20.152 watchdog/fd 0/RP0/CPU0 t3126 wd_filedes_add_proc: raw_ip, Pid 23248 added to FD ring at Sep 25 03:41:20.152 Sep 25 03:44:20.382 watchdog/fd 0/RP0/CPU0 t3126 wd_filedes_add_proc: raw_ip, Pid 24471 added to FD ring at Sep 25 03:44:20.382 Sep 25 03:47:20.609 watchdog/fd 0/RP0/CPU0 t3126 wd_filedes_add_proc: raw_ip, Pid 25510 added to FD ring at Sep 25 03:47:20.609 After a certain amount of FD, process is restarted to keep router and overall system safe. And then all tracking goes down, in the same time. Having a deeper look, we could see the process restarted several times since you implemented the IPSLA config: RP/0/RP0/CPU0:NCS5500#sh processes raw_ip Wed Sep 25 11:22:07.182 CEST Job Id: 241 PID: 17536 Process name: raw_ip Executable path: /opt/cisco/XR/packages/ncs5500-infra-5.0.0.0-r6625/all/bin/raw_ip Instance #: 1 Version ID: 00.00.0000 Respawn: ON Respawn count: 398 <<< the dump generated shows a lot of XIPC entries: ---------------------------------------------------------------- Output from show process file details: Time 11:02:30.289 CEST Wed Sep 25 2019 ---------------------------------------------------------------- Jid: 241 Total open files: 53726 Name: raw_ip_main ------------------------------------------------------------- File Descriptor File Name --------------- --------- 802 /XIPC_xipcq_2_9_1_219_6667946_i (deleted 803 /XIPC_xipcq_2_9_1_219_6667952_i (deleted RP/0/RP0/CPU0:NCS5500#sh xipcq jid 241 Wed Sep 25 11:32:34.936 CEST Id Name Size Cur Size Produced Dropped HWM ------ ------------------------------- ------ --------- ----------- ----------- -------- 1532 XIPC_xipcq_2_9_1_219_6820967_o 4000 0 0 0 0 12 XIPC_raw_124 2000 0 0 0 0 11 XIPC_raw_125 6000 1 1500 0 8 10 XIPC_raw_iq_6 2666 0 185 0 2 8 XIPC_raw_iq_7 2666 0 183 0 2 9 XIPC_raw_iq_3 2666 0 170 0 2 6 XIPC_raw_iq_5 2666 0 187 0 3 7 XIPC_raw_iq_4 2666 0 187 0 2 3 XIPC_raw_iq_1 2666 0 201 0 2 5 XIPC_raw_iq_0 2666 0 201 0 1 4 XIPC_raw_iq_2 2666 0 186 0 2 2 XIPC_raw_i1 8000 0 0 0 0 1 XIPC_raw_i0 8000 0 0 0 0 We could see the number of FD is increasing very fast, especially queue XIPC_raw_125, to ultimately have ip_raw process restart: RP/0/RP0/CPU0:NCS5500#sh xipcq jid 241 | i XIPC_raw_ Wed Sep 25 11:46:26.249 CEST 12 XIPC_raw_124 2000 0 0 0 0 11 XIPC_raw_125 6000 0 13617 0 15 <<<< 10 XIPC_raw_iq_7 2666 0 1683 0 3 9 XIPC_raw_iq_5 2666 0 1683 0 3 8 XIPC_raw_iq_6 2666 0 1683 0 3 7 XIPC_raw_iq_4 2666 0 1683 0 2 3 XIPC_raw_iq_3 2666 0 1530 0 2 5 XIPC_raw_iq_2 2666 0 1683 0 2 6 XIPC_raw_iq_1 2666 0 1836 0 2 4 XIPC_raw_iq_0 2666 0 1836 0 3 2 XIPC_raw_i1 8000 0 0 0 0 1 XIPC_raw_i0 8000 0 0 0 0 RP/0/RP0/CPU0:NCS5500#sh xipcq jid 241 | i XIPC_raw_ Wed Sep 25 11:46:27.592 CEST 12 XIPC_raw_124 2000 0 0 0 0 11 XIPC_raw_125 6000 0 0 0 0 10 XIPC_raw_iq_6 2666 0 0 0 0 8 XIPC_raw_iq_5 2666 0 0 0 0 9 XIPC_raw_iq_7 2666 0 0 0 0 6 XIPC_raw_iq_4 2666 0 0 0 0 5 XIPC_raw_iq_2 2666 0 0 0 0 4 XIPC_raw_iq_0 2666 0 0 0 0 3 XIPC_raw_iq_1 2666 0 0 0 0 7 XIPC_raw_iq_3 2666 0 0 0 0 2 XIPC_raw_i1 8000 0 0 0 0 1 XIPC_raw_i0 8000 0 0 0 0 If we do the maths, we have 13k + 8x 1693 = 26k And in addition, we have all those embryonic FD and then they reach 0 once process is restarted: 11348 XIPC_xipcq_2_9_1_219_6928052_o 4000 0 0 0 0 RP/0/RP0/CPU0:NCS5500#sh xipcq jid 241 | i XIPC_xipcq | utility wc -l Wed Sep 25 11:49:29.723 CEST 13535 RP/0/RP0/CPU0:NCS5500#sh xipcq jid 241 | i XIPC_xipcq | utility wc -l Wed Sep 25 11:49:30.810 CEST 0 That's a total of 39k.
Cisco Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.