BugZero | Cisco BugID CSCwi73980 - Kernel Panic due to Memory Exhaustion caused by sv...

Cisco - Defect ID: CSCwi73980

Kernel Panic due to Memory Exhaustion caused by svc_ifc_eventmgr

Cisco - Defect ID: CSCwi73980

Kernel Panic due to Memory Exhaustion caused by svc_ifc_eventmgr

Last updated on 2/2/2024

Overall: 7.47.4

Severity: 8.28.2

Lifecycle: 4.64.6

Popularity: 4.64.6

What is the BugZero Risk Score?

Vendor details

No defect details.

Overall: 7.47.4

Severity: 8.28.2

Lifecycle: 4.64.6

Popularity: 4.64.6

What is the BugZero Risk Score?

Vendor details

No defect details.

Symptom

A N9K switch may face an unexpected reload on one or more supervisor modules due to a kernel panic resulting from an Out of Memory (OOM) state: `show system reset-reason` ----- reset reason for module 28 (from Supervisor in slot 28) --- 1) At 125065 usecs after Mon Jan 01 01:02:03 2024 Reason: Kernel Panic <<<------------------- Service: Version: 10.2(2) `show logging onboard internal reset-reason` ---------------------------- Module: 28 ---------------------------- Reset Reason for this card: Image Version : 10.2(2) Reset Reason (LCM): Unknown (0) at time Mon Jan 01 01:08:52 2024 Reset Reason (SW): Kernel Panic (19) at time Mon Jan 01 01:02:03 2024 <<<------------------- Reset Reason (HW): Watchdog Timeout (32) at time Mon Jan 01 01:08:52 2024

Conditions

The kernel panic is the result of an Out of Memory (OOM) state due to excessive memory utilization by svc_ifc_eventmgr. An excessive amount of BGP updates/churn can result in this memory leak.

Workaround

There are no known workarounds available for this issue. If users notice a significant amount of memory being held by the svc_ifc_eventmgr process, they can try reloading to temporarily free up that memory. Users can track any potential memory growth in the svc_ifc_eventmgr process by checking the size of HEAP memory in the following commands: show system internal kernel memory service svc_ifc_eventmgr OR show system internal kernel memory uuid 1319 ! 1319 is the UUID for the svc_ifc_eventmgr process

Further Problem Description

This issue was marked as a duplicate of defect CSCwb32663. Please refer to CSCwb32663's Release Notes for information regarding the software fix. There are no process logs, core files, or exception logs generated by the event. However, there is data from the kernel panic in the stack-trace outputs. In particular, users can look for call traces pointing to "out of memory" and "page fault" (also memory related): `show logging onboard stack-trace` [41332678.110981] Call Trace: [41332678.110990] dump_stack+0x6d/0x8b [41332678.110994] dump_header+0x6a/0x274 [41332678.110999] out_of_memory+0x253/0x2e0 <<<------------------- [41332678.111002] __alloc_pages_slowpath+0xa0f/0xe30 [41332678.111007] __alloc_pages_nodemask+0x249/0x280 [41332678.111010] filemap_fault+0x302/0x6c0 [41332678.111013] ? __check_object_size+0x45/0x200 [41332678.111017] ? filemap_map_pages+0x126/0x300 [41332678.111021] __do_fault+0x3e/0x100 [41332678.111023] __handle_mm_fault+0x5c1/0xc80 [41332678.111027] handle_mm_fault+0x100/0x230 [41332678.111031] __do_page_fault+0x291/0x4b0 [41332678.111035] do_page_fault+0x2e/0xf0 [41332678.111038] ? page_fault+0x5/0x20 [41332678.111040] page_fault+0x1b/0x20 <<<------------------- [41332678.203105] Call Trace: [41332678.203114] dump_stack+0x6d/0x8b [41332678.203118] ? prandom_reseed+0x170/0x170 [41332678.203122] ? panic+0x1/0x247 [41332678.203129] nxos_panic+0xf2/0x530 [klm_obfl] <<<------------------- [41332678.203131] ? panic+0x1/0x247 [41332678.203135] kprobe_ftrace_handler+0x8f/0xf0 [41332678.203138] ? set_ti_thread_flag+0xe/0xe [41332678.203141] ? out_of_memory+0x278/0x2e0 [41332678.203146] ftrace_ops_assist_func+0x97/0x140 [41332678.203154] 0xffffffffc01550da [41332678.203157] RIP: 0010:panic+0x1/0x247 Further down in the stack-trace, outputs show that svc_ifc_eventmg (tied to event manager) was holding a significant amount of memory: [41332678.111158] Tasks state (memory values in pages): [41332678.111158] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name [41332678.111232] [ 1778] 0 1778 6082319 5804483 47399852 0 0 svc_ifc_eventmg <<<------------------- [41332678.111648] [ 6492] 0 6492 1513017 52200 1046430 0 0 bgp [41332678.111632] [ 6482] 0 6482 1438398 40815 907160 0 0 mrib [41332678.111640] [ 6487] 0 6487 1421992 40786 891838 0 0 m6rib [41332678.111646] [ 6490] 0 6490 1308639 19827 766842 0 0 hmm [41332678.115546] nxos_panic: Kernel panic - not syncing: fatal exception Users may see other information in the syslogs ("show logging nvram" and "show logging log") that points to a memory-exhausted state as well: 2024 Jan 01 01:00:58 n9kSW %DAEMON-3-SYSTEM_MSG: error: do_exec_no_pty: fork: Cannot allocate memory - dcos_sshd[XXXX] 2024 Jan 01 01:00:59 n9kSW %DAEMON-3-SYSTEM_MSG: error: do_exec_no_pty: fork: Cannot allocate memory - dcos_sshd[XXXX] 2024 Jan 01 01:01:00 n9kSW %DAEMON-2-SYSTEM_MSG: fatal: fork of unprivileged child failed - dcos_sshd[XXXX] 2024 Jan 01 01:01:00 n9kSW %LOCAL7-3-SYSTEM_MSG: ssh: fork failed: Cannot allocate memory (errno = 12) - dcos-xinetd[XXXXX] 2024 Jan 01 01:01:12 n9kSW %LOCAL7-3-SYSTEM_MSG: ssh: fork failed: Cannot allocate memory (errno = 12) - dcos-xinetd[XXXXX] (message repeated 1 time) 2024 Jan 01 01:01:12 n9kSW %DAEMON-3-SYSTEM_MSG: error: do_exec_no_pty: fork: Cannot allocate memory - dcos_sshd[XXXX] 2024 Jan 01 01:01:12 n9kSW %DAEMON-2-SYSTEM_MSG: fatal: fork of unprivileged child failed - dcos_sshd[XXXX] 2024 Jan 01 01:02:03 n9kSW %SYSMGR-2-CORE_SAVE_FAILED: master_core_client_try_spawn: PID XXXXX with message Unable to start core client. Cannot allocate memory. Users may see high memory utilization in svc_ifc_eventmg elsewhere in the "show tech detail" outputs as well: `show pie envmon mem-usage detail count 0` 2024-01-01 00:00:02 Event Id: xxxxxxxx Event Class: MEM usage insights Source Id: 0x1c01 Mod: 28 Memory_Health : Severe Alert <<<------------------- MODULE 28: ****** Memory usage ****** Memory Total : 32822690 KB Memory Used : 32591742 KB Memory Free : 230948 KB VmallocTotal : 34359738367 KB VmallocUsed : 0 KB Memory_Health : Severe Alert ******* Top users of Memory ********* PID VIRT(KB) RES(KB) %CPU %MEM COMMAND 1768 24265434 23205056 1.30 70.60 svc_ifc_eventmg <<<------------------- 2670 3666046 368296 0.00 1.10 urib 2353 832582 362756 0.00 1.10 vpx1 2148 1123450 302224 0.00 0.90 clis

Original Vendor Announcement

9.65Defect ID: CSCwd45843
Auth Step latency for policy evaluation due to Garbage Collection activity.
9.65Defect ID: CSCwa92734
CUBE DTMF interworking fails from rtp-nte to OOB SIP methods
9.65Defect ID: CSCwj45822
Cisco ASA and FTD Software Remote Access VPN Brute Force Denial of Service Vulnerability
9.65Defect ID: CSCvo03458
PKI "revocation check crl none" does not fallback if CRL not reachable
9.65Defect ID: CSCvq05584
Cisco IOS and IOS XE Software Tcl Arbitrary Code Execution Vulnerability

Ready to prevent the next vendor outage?

Get a demo

OPERATIONAL DEFECT DATABASE

Cisco - Defect ID: CSCwi73980

Kernel Panic due to Memory Exhaustion caused by svc_ifc_eventmgr

Cisco - Defect ID: CSCwi73980

Kernel Panic due to Memory Exhaustion caused by svc_ifc_eventmgr

Last updated on 2/2/2024

Vendor details

Vendor details

Description

Symptom

Conditions

Workaround

Further Problem Description

Links

Top Cisco defects by risk score

Ready to prevent the next vendor outage?