Symptom
On N9K-X9408PC-CFP2 linecards and N9K-M4PC-CFP2 modules, the bcm_usd process may crash with the following logging messages, and the affected linecard or the entire switch will reload:
2018 Jul 15 15:52:19.524 N9500 %SYSMGR-SLOT8-2-SERVICE_CRASHED: Service "bcm_usd" (PID 7466) hasn't caught signal 11 (core will be saved).
2018 Jul 15 15:52:19.840 N9500 %SYSMGR-SLOT8-2-HAP_FAILURE_SUP_RESET: Service "bcm_usd" in vdc 1 has had a hap failure
2018 Jul 15 15:53:09.071 N9500 %MODULE-2-MOD_DIAG_FAIL: Module 8 (Serial number: ) reported failure due to Service on linecard had a hap-reset in device DEV_SYSMGR (device error 0x30b)
After the reload, the show logging onboard module slot-number for the affected module will indicate that the bcm_usd experienced a hap-reset:
Exception Log Record : Sun Jul 15 15:53:08 2018 (454396 us)
Device Id : 134
Device Name : System Manager
Device Error Code : 30b(H)
Device Error Type : ERR_TYPE_DIAG
Device Error Name : NULL
Device Instance : 0
Sys Error : Service on linecard had a hap-reset
Errtype : CATASTROPHIC
PhyPortLayer : 0x0
Port(s) Affected :
Error Description : bcm_usd hap reset
DSAP : 0
UUID : 1
Time : Sun Jul 15 15:53:08 2018
(454395 usecs 5B4B43B4(H) jiffies)
There will also be a core file for the bcm_usd process generated, visible in the show cores command output.
Conditions
This defect only applies to N9K-X9408PC-CFP2 and N9K-M4PC-CFP2 modules, and occurs when a data parity error is detected in the runtime memory contents of the Ranger+ (Triumph3) MAC ASIC.
Further Problem Description
The software driver incorrectly handles a parity error event on Ranger+ MAC ASICs that are driving 100Gbps interfaces on the N9K-X9408PC-CFP2 linecards and N9K-M4PC-CFP2 modules. The parity error itself is not indicative of a hardware failure, and may occur spuriously during normal linecard operation. In absence of a hardware failure, the linecard will operate normally after reload.