
OPERATIONAL DEFECT DATABASE
...

...
Certain DIMMs from a specific manufacturing lot (specific date codes only) will fail at a higher rate than expected. The most common failure symptom will be significant single bit (correctable) errors. If left untreated, the DIMM may be a higher risk for multibit (uncorrectable) errors during runtime On NXOS devices, single bit correctable errors will be logged with the following logs: %DEVICE_TEST-3-MCE_24HR_FAIL: Module 1 has exceeded MCE 24 hour correctable threshold of 100 with ##### correctable errors within 24 hours. or %DAEMON-3-SYSTEM_MSG: corrected Socket memory error count exceeded threshold: ####### in 24h - mcelog On ACI Devices, The impacted dimm can be find from /mnt/pss/bootlogs/current/dmesg, or output of "dmesg" command, for example logs below confirms DIMMs are bad and in which DIMM-0 is bad. [ 167.751610] sbridge: HANDLING MCE MEMORY ERROR [ 167.751614] CPU 0: Machine Check Exception: 0 Bank 7: 8c00004000010091 [ 168.415928] EDAC MC0: 1 CE memory read error on CPU_SrcID#0_Channel#1_DIMM#0 (channel:1 slot:0 page:0x53232 offset:0xfc0 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0091 socket:0 channel_mask:2 rank:0)
This issue impacts a subset of DIMMs within a certain date range. Even inside this date range, not all DIMMs are impacted.
This is a hardware error. No SW workarounds are available to address this issue.
Impacted devices: N9K family of switches running NXOS or ACI APIC family: APIC-SERVER-L3 APIC-SERVER-M3 Please see the following document for additional information: https://www.cisco.com/c/en/us/support/docs/field-notices/724/fn72464.html
Click on a version to see all relevant bugs
Cisco Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.