
OPERATIONAL DEFECT DATABASE
...

...
VSM Card is silently reloaded / restarted, due to either of the following observations: 1. VSM LC-XR QNX Kernel detecting an exception and triggering an explicit LC-XR crash 2. WDSYSMON process in VSM LC-XR VM internally calling reboot of LC-XR VM, upon detecting a Persistent CPU hog observed for more than 30 secs. 3. RSP CANB Server triggers a watchdog reset on VSM LC, upon observing the watchdog toggle not happening on LC #1 could be observed & checked through the following: --------------------------------------------- As a result of VSM Card Crash/Reload, the 'crashinfo' file in LC-XR VM /lcdisk0:/dumper directory, is expected to be generated and to indicate the crash reason as: " Crash Reason: Kernel Crash Exception at 0xfe6c33aa signal 5 c=2 f=0" Expected Signature of Syslog messages: In the following syslogs, 'envmon_lc' is reported as top user of CPU. In other instances, the processes such as 'ntpdc', 'dev-ahci' could also be reported as 'top user of CPU'. LC/0/2/CPU0:Jan 30 17:30:43.464 : wdsysmon[372]: Process envmon_lc pid 176215 prio 10 using 25 percent is the top user of CPU RP/0/RSP1/CPU0:Jan 30 17:30:57.120 : canb-server[151]: %PLATFORM-CANB_SERVER-7-CBC_PRE_RESET_NOTIFICATION : Node 0/2/CPU0 , Power Cycle (0x05000000) RP/0/RSP0/CPU0:Jan 30 17:30:57.120 : shelfmgr[403]: %PLATFORM-SHELFMGR-6-NODE_CPU_RESET : Node 0/2/CPU0 CPU reset detected. RP/0/RSP0/CPU0:Jan 30 17:30:57.121 : shelfmgr[403]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/2/CPU0 A9K-VSM-500 state:BRINGDOWN RP/0/RSP0/CPU0:Jan 30 17:30:57.141 : invmgr[255]: %PLATFORM-INV-6-OIROUT : OIR: Node 0/2/1 Sn: N/A removed RP/0/RSP0/CPU0:Jan 30 17:30:57.151 : invmgr[255]: %PLATFORM-INV-6-NODE_STATE_CHANGE : Node: 0/2/CPU0, state: BRINGDOWN RP/0/RSP0/CPU0:Jan 30 17:31:04.295 : canb-server[151]: %PLATFORM-CANB_SERVER-7-CBC_POST_RESET_NOTIFICATION : Node 0/2/CPU0 , Power Cycle (0x05000000) RP/0/RSP0/CPU0:Jan 30 17:31:04.296 : shelfmgr[403]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/2/CPU0 A9K-VSM-500 state:ROMMON --------------------------------------------- #2 could be observed & checked through the following: ---------------------------------------------- As a result of VSM Card Crash/Reload, the 'crashinfo' file in LC-XR VM /lcdisk0:/dumper directory, is expected to be generated and to indicate the crash reason as: "Crash Reason: Cause code 0x2c000008 Cause: wdsysmon: persistent hog detected" Expected Signature of syslog messages: LC/0/2/CPU0:Jan 22 11:11:30.981 : wdsysmon[372]: Persistent Hog detected for more than 20 seconds LC/0/2/CPU0:Jan 22 11:11:31.583 : wdsysmon[372]: Persistent Hog detected for more than 20 seconds LC/0/2/CPU0:Jan 22 11:11:32.185 : wdsysmon[372]: Persistent Hog detected for more than 30 seconds LC/0/2/CPU0:Jan 22 11:11:32.787 : wdsysmon[372]: Persistent hog (lasting more than 30 seconds) detected by wdsysmon on CPU3. Resetting node soon LC/0/2/CPU0:Jan 22 11:11:32.787 : wdsysmon[372]: Process: , Pid 0, Tid 0, Priority 0, Util 0.0 % is the top user of the CPU LC/0/2/CPU0:Jan 22 11:11:32.787 : wdsysmon[372]: Process: , Pid 0, Tid 0, Priority 0, Util 0.0 % is the top user of the CPU LC/0/2/CPU0:Jan 22 11:11:32.846 : syslog_dev[87]: wdsysmon[372] PID-592994450: Fri Jan 22 11:11:32 ISR 2016 LC/0/2/CPU0:Jan 22 11:11:34.818 : wdsysmon[372]: reboot_internal: Incomplete graceful reboot cleanup (Connection timed out) LC/0/2/CPU0:Jan 22 11:11:34.818 : wdsysmon[372]: Fri Jan 22 11:11:32 2016:sync start LC/0/2/CPU0:Jan 22 11:11:34.818 : wdsysmon[372]: Fri Jan 22 11:11:32 2016:sync end LC/0/2/CPU0:Jan 22 11:11:34.818 : wdsysmon[372]: Fri Jan 22 11:11:32 2016:platform_reboot_op start RP/0/RSP1/CPU0:Jan 22 11:11:43.755 : canb-server[151]: %PLATFORM-CANB_SERVER-7-CBC_PRE_RESET_NOTIFICATION : Node 0/2/CPU0 , Power Cycle (0x05000000) RP/0/RSP0/CPU0:Jan 22 11:11:43.757 : shelfmgr[403]: %PLATFORM-SHELFMGR-6-NODE_CPU_RESET : Node 0/2/CPU0 CPU reset detected. RP/0/RSP0/CPU0:Jan 22 11:11:43.758 : shelfmgr[403]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/2/CPU0 A9K-VSM-500 state:BRINGDOWN RP/0/RSP0/CPU0:Jan 22 11:11:43.775 : invmgr[255]: %PLATFORM-INV-6-OIROUT : OIR: Node 0/2/1 Sn: N/A removed RP/0/RSP0/CPU0:Jan 22 11:11:43.784 : invmgr[255]: %PLATFORM-INV-6-NODE_STATE_CHANGE : Node: 0/2/CPU0, state: BRINGDOWN RP/0/RSP1/CPU0:Jan 22 11:11:50.798 : canb-server[151]: %PLATFORM-CANB_SERVER-7-CBC_POST_RESET_NOTIFICATION : Node 0/2/CPU0 , Power Cycle (0x05000000) RP/0/RSP0/CPU0:Jan 22 11:11:50.800 : shelfmgr[403]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/2/CPU0 A9K-VSM-500 state:ROMMON ---------------------------------------------- #3 could be observed & checked through the following: Expected signature of syslog messages: --------------------------------------------- RP/0/RSP0/CPU0:Jan 3 01:21:12.035 : envmon[207]: %PLATFORM-ENVMON-4-CBC_WDOG_EXCEED_THRESHOLD : CBC on node 0/2/CPU0 has not seen watchdog toggle in at least 22 seconds RP/0/RSP1/CPU0:Jan 3 01:23:20.187 : canb-server[151]: %PLATFORM-CANB_SERVER-7-CBC_PRE_RESET_NOTIFICATION : Node 0/2/CPU0 , WDOG SReset (0x06000000) RP/0/RSP0/CPU0:Jan 3 01:23:20.194 : shelfmgr[403]: %PLATFORM-SHELFMGR-6-NODE_CPU_RESET : Node 0/2/CPU0 CPU reset detected. RP/0/RSP0/CPU0:Jan 3 01:23:20.195 : shelfmgr[403]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/2/CPU0 A9K-VSM-500 state:BRINGDOWN RP/0/RSP0/CPU0:Jan 3 01:23:20.235 : invmgr[255]: %PLATFORM-INV-6-OIROUT : OIR: Node 0/2/1 Sn: N/A removed RP/0/RSP0/CPU0:Jan 3 01:23:20.237 : invmgr[255]: %PLATFORM-INV-6-NODE_STATE_CHANGE : Node: 0/2/CPU0, state: BRINGDOWN RP/0/RSP1/CPU0:Jan 3 01:23:36.183 : canb-server[151]: %PLATFORM-CANB_SERVER-7-CBC_PRE_RESET_NOTIFICATION : Node 0/2/CPU0 , WDOG HReset (0x07000000) RP/0/RSP0/CPU0:Jan 3 01:23:37.187 : canb-server[151]: %PLATFORM-CANB_SERVER-7-CBC_PRE_RESET_NOTIFICATION : Node 0/2/CPU0 , WDOG Power Cycle (0x08000000) RP/0/RSP0/CPU0:Jan 3 01:23:37.307 : ce_switch_srv[54]: %PLATFORM-CE_SWITCH-6-UPDN : Interface 6 (LC_Slot_2) is down RP/0/RSP1/CPU0:Jan 3 01:23:37.330 : ce_switch_srv[54]: %PLATFORM-CE_SWITCH-6-UPDN : Interface 6 (LC_Slot_2) is down RP/0/RSP0/CPU0:Jan 3 01:23:43.715 : canb-server[151]: %PLATFORM-CANB_SERVER-7-CBC_POST_RESET_NOTIFICATION : Node 0/2/CPU0 , WDOG Power Cycle (0x08000000) RP/0/RSP0/CPU0:Jan 3 01:23:43.716 : shelfmgr[403]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/2/CPU0 A9K-VSM-500 state:ROMMON RP/0/RSP1/CPU0:Jan 3 01:23:43.718 : canb-server[151]: %PLATFORM-CANB_SERVER-7-CBC_POST_RESET_NOTIFICATION : Node 0/2/CPU0 , WDOG Power Cycle (0x08000000) ---------------------------------------------
It is observed that the VSM LC-XR's one of the Logical/Virtual CPU Cores (0 to 3) seems to get stuck and doesn't seem to respond to IPI, which is resulting either a Kernel Exception situation or CPU hog like situation. It is also observed that the stuck CPU core is running 'procnto-smp-instr' process (in almost all the reload/restart instances), the idle process, at the time of VSM Card Reload/restart.
None
Click on a version to see all relevant bugs
Cisco Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.