Loading...
Loading...
Defect required to track investigation and provide a fix for control-plane memory leak on ACI leaf nodes running Kwai 5.2(6x). The issue can result in unexpected leaf reloads due to OOM kernel panic.
- Leaf switches report high memory utilization (95?98%). - Kernel logs show "Kernel panic ? not syncing: Out of memory: system-wide panic_on_oom is enabled." - System reboots unexpectedly. - Crash backtrace includes: panic → check_panic_on_oom → out_of_memory → __alloc_pages_nodemask. - Spines remain stable and unaffected.
- Observed on fixed-leaf platforms (e.g., N9K-C9336C-FX2) running ACI 5.2(6e). - Occurs after extended uptime (weeks/months) as control-plane memory gradually increases. - Commonly associated with LCM and svc_ifc processes consuming large resident memory. - panic_on_oom=1 enabled by default, converting OOM into full kernel panic.
No direct workaround to recover memory once leaked. Mitigation options: 1. Monitor process memory growth with "show processes memory sorted" and "show system internal sysmgr service memory detail". 2. Proactively reload affected nodes before reaching critical memory thresholds.
Crash and dmesg analysis confirm complete depletion of all page allocator zones (DMA, DMA32, Normal). Active anonymous pages dominate total memory usage with minimal reclaimable cache, indicating a long-term leak in user-space control-plane daemons rather than transient workload spikes. The panic is triggered by panic_on_oom=1, leading to immediate system reboot. Identical behavior observed across multiple leaves at the same timestamp suggests a systemic leak affecting the control-plane processes (LCM, svc_ifc, or related modules). Engineering to investigate heap allocation and message buffer handling in LCM/svc_ifc daemons and confirm whether existing fixes in 5.2(8x) or 6.0(4x) resolve the issue.
Cisco Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.