Loading...
Loading...
Data Domain (DD) systems monitor the status of system memory hardware (DIMMs). If any DIMM-related errors are encountered, an appropriate Alert notification is posted. Applies to: All Data Domain systems All software versions of Data Domain Operating System (DDOS) Possible Alert Notifications posted by DDOS: DIMM-00001: Correctable ECC logging limit reached DIMM-00002: Multibit Uncorrectable ECC error DIMM-00003: A memory card has failed ENVIRONMENT-00009: Memory correctable ECC errors exceed warning threshold ENVIRONMENT-00013: Memory uncorrectable ECC error alert. ENVIRONMENT-00044: Memory riser fault has been detected MEM-00001: DIMM failure detected after install. DDFS ""will not be started. MEM-00002: Memory size(nnnnnnnnKB) goes below the configured size(nnnnnnnnKB)
The DIMMs installed on Data Domain systems have Error Checking Code (ECC) which allows for Correctable Memory Errors to be fixed on-the-fly. If an error threshold is breached, then DDOS identifies the fault and an appropriate Alert will be generated on the system. Uncorrectable memory errors may cause a system reboot and is considered a hard memory fault. Total failure of any single DIMM or Memory Riser may result in a System Down event and prevent the Filesystem from being enabled. This is because the Data Domain File System (DDFS) process fills most of the physical memory. 🛠️ NOTE: Other symptoms or alerts may mask memory errors - for example, CPU Machine Check Error - a reboot may address the underlying memory issue OR Deeper log analysis and troubleshooting may be required.
✅ NOTE: If an DIMM error is reported on Dell PowerEdge based systems, the first action to recover is to reboot the DataDomain unit. This will initiate PPR (POST Package Repair) to recover the DIMM. Efforts must be made to determine the cause of the alert and identify the affected component DIMMs, CPU, or Motherboard, and replace parts as needed. If possible, gather a Support Bundle and create a Service Request with your contracted Service Provider. The following video shows how to gather a Support Bundle: Gather a Support Bundle Resolution Guidelines: For Dell PowerEdge based systems, initiate a system reboot to facilitate automatic POST-Package Repair (PPR); for the recovery of the DIMM. Improvements in BIOS Firmware allow for PPR to recover DIMM correctable & uncorrectable Errors ( Reference ) Compare current system state with an Auto-Support from BEFORE the DIMM failure or alert Useful DD-CLI (SSH) commands for checking memory: # alerts show current # system show meminfo # enclosure show memory # log view debug/messages.engineering ('q' to quit) Use DDOS Offline Diagnostics to test and determine fault. Go to Dell Support to access the Dell EMC Data Domain Operating System 6.x Offline Diagnostics Suite User Guide If possible, perform physical troubleshooting methods to eliminate and determine faulty component (using documented replacement guides and procedures). Reseat the DIMM - ensure that both sides are latched properly. Swap it with a known good DIMM from another slot, channel, bank, or controller: If a system is down (no boot) due to a suspected memory/dimm fault, try a minimal boot option (remove peripheral devices, or cards and leave 1x DIMM in slot '0')
Click on a version to see all relevant bugs
Dell Integration
Learn more about where this data comes from
BugZero Plan
Streamline upgrades with automated vendor bug scrubs
BugZero Prevent
Wish you caught this bug sooner? Get proactive today.