Loading...
Loading...
Understanding Uncorrectable sectors and parity errors on a CLARiiON, VNX, or Unity array. Event log messages such as the following, may also appear as Dial Homes: VNX1 Error code: 0x953 Uncorrectable Parity Sector Error code: 0x957 Uncorrectable Data Sector Error code: 0x68A Uncorrectable Parity Sector Error code: 0x695 Uncorrectable Data Sector Error code: 0x840 Data Sector Invalidated b26 Cache has issued CORRUPT_CRC. LUN= 309 ca_sync.c 0 309 2 VNX2 71688003 Uncorrectable Sector RAID Group: %2 Position: %3 LBA: %4 Blocks: %5 Error info: %6 Extra info: %7 71688008 Uncorrectable Sector RAID Group: 10 Position: 1 LBA: d180 Blocks: 8 Error info: 0 Extra info: e [r5_rb FLU 8224 r5_rb] 71688008 Uncorrectable Sector RAID Group: 10 Position: 1 LBA: d170 Blocks: 8 Error info: 0 Extra info: e [r5_rb FLU 8224 r5_rb] 71688001 Data Sector Invalidated RAID Group: 10 Position: 1 LBA: d121 Blocks: 7 Error info: 0 Extra info: e [r5_rb FLU 8224 r5_rb] Please see article 382528 VNX2: Array reports events like 0x71688001,0x71688002, 0x71688003, 0x71688007 or 0x71688008 (User Correctable) for additional event codes.
Uncorrectable errors occur when two different disks in the same raid group, within the same sector, have media errors. One example, when a disk with media errors is copying to a hotspare , and another disk in the same raid group, in the same sector, also has media errors, this would result in an uncorrectable error / sector. The event codes described above are logged when the system is unable to read data sectors from a disk, and subsequent attempts to reconstruct the data from other disk in the RAID group failed. The "Uncorrectable" messages indicate which disk(s) was unable to successfully read the sectors from, and the "Invalidated" messages indicate which disk(s) sectors were marked as being void of valid information in a specific location. This marking is done to ensure that no invalid data will be returned to a host system. Attempts to read from an invalidated location will result in a hard error being returned to a host. Attempts to write to an invalidated location will complete successfully and generally "fill" (overwrite) the void location, thus effectively fixing the uncorrectable. This is the reason that sometimes past uncorrectable errors disappear after a host has overwritten these sectors with new good data.
For VNX: Once all the hardware issues are resolved, Dell EMC Technical Support will need to execute a manual Read Only Background Verify (ROBV) if the affected internal LUN(s) in the affected pool. ROBV reads and checks the data for uncorrectables on the entire LUN (internal), including un-used space to determine how many uncorrectables sectors may still exist. Once ROBV has completed, if uncorrectables are still occurring, your Dell EMC Technical Support Engineer will need to execute additional steps including collecting and analyzing Storage Allocation Table information(SAT) to identify the specific user LUN(s) affected (the internal LUNs where the uncorrectables were found will be mapped to the User LUNs). For a complete explanation and the pre-requirements needed to execute a ROBV, please see article 466638, VNX: Explanation of Read Only Background Verify (ROBV) (User Correctable) When an uncorrectable sector is found in a user LUN, the user data will need to be verified by the host application to determine if the user data is corrupt or if the error resides in unused space. Any process that would read the data such as a backup would suit to identify/flag possible corruption. If there is corruption, the data can be restored from a good backup, with either a full restore, or a partial restore of only the affected file(s). If there is not a good backup, another means from the host application should be used to restore or recreate the data. Should the uncorrectable error not be found in user data, the background processes may still discover the error in the future, if host I/O does not overwrite the sector. This can lead to an incorrect assessment that this is a new error and cause delays in analysis and remediation for an old error that was not completely resolved. In this case, it is highly recommended to move the good data to another LUN and delete the original affected LUN. For Unity , other methods may exist to try to help resolve this issue. Please check for more Unity specific articles.
Click on a version to see all relevant bugs
Dell Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.