Loading...
Loading...
An EX linecard or E fabric module crashes and resets when it encounters the PCIe uncorrectable error. You can see that the card failed, powered off, and then got removed and reinserted by navigating to the following location in the Cisco APIC GUI and viewing the events: Fabric > Inventory > Pod x > Spine > History > Events Problem can cause unicast and multicast traffic loss. See CSCvg80698.
show logging onboard stack-trace should show: pcieport 0000:00:0f.0: AER: Uncorrected (Non-Fatal) error received: id=00a0 Kernel panic - not syncing: PCIE Uncorrectable error encountered Pid: 11, comm: kworker/0:1 Tainted: P W O 3.4.91.0.0insieme-0 #1 Call Trace: [] panic+0x129/0x233 [] aer_isr+0x22b/0x2e0 [] ? finish_task_switch+0x6d/0xf0 [] process_one_work+0x294/0x480 [] worker_thread+0x1e9/0x330 [] ? manage_workers.isra.20+0x200/0x200 [] kthread+0x8e/0xa0 [] kernel_thread_helper+0x4/0x10 [] ? retint_restore_args+0x13/0x13 [] ? __init_kthread_worker+0x40/0x40 [] ? gs_change+0x13/0x13 cctrl: pre kgdb notifier obfl_klm writing reset reason 19, system crash Heartbeat missed from sup Heartbeat missed from sup Heartbeat missed from sup write_mtd_flash_panic: successfully wrote 88 bytes at address 0x0 to RR Iter: 0. __kgdb_notify: Trying to fall info kgdb
Contact TAC who can provide root access to apply a workaround. This workaround is not persistent across reload.
In the current version, the FM will be crashed and reset when it encountered the PCIE uncorrectable error. Suggest to set the threshold and count the errors, instead reset directly after one-time error.
Cisco Integration
Learn more about where this data comes from
BugZero Plan
Streamline upgrades with automated vendor bug scrubs
BugZero Prevent
Wish you caught this bug sooner? Get proactive today.