Loading...
Loading...
This can occur when an SDS device has read errors that have been corrected by the SDS. This can occur when the background scanner is disabled or enabled. The fixed errors on a device can be shown in the following places: The GUI shows an error: The "--query_sds --sds_id <SDS_ID>" output shows a counter for each device with corrected read errors: 15: Name: /dev/sdr Path: /dev/sdr Original-path: /dev/sdr ID: 2d63f7c80003000e Storage Pool: SAS_pool1, Capacity: 1116 GB Error-fixes: 6 scanned 0 MB, Compare errors: 0 State: Normal The counters_dump.txt in MDM getInfoDump shows the FIXED_READ_ERROR_COUNT on different objects: ID: df7700a600120012 DEVICE_TYPE READ_ERR FIXED_READ_ERROR_COUNT 1 ID: 1d1e4e5500000012 SDS_TYPE READ_ERR FIXED_READ_ERROR_COUNT 1 ID: 1c34e1f700000007 STORAGE_POOL_TYPE READ_ERR FIXED_READ_ERROR_COUNT 1 ID: b9b286df00000001 PROTECTION_DOMAIN_TYPE READ_ERR FIXED_READ_ERROR_COUNT 1 ID: 49b6b8057d1fc84b SYSTEM_TYPE READ_ERR FIXED_READ_ERROR_COUNT 1 Note: There is no event in the MDM events log to indicate this "fixed read errors" condition was seen. Other possible symptoms: The device may be in an Error state. There may be errors on the block device in the system messages or syslog: blk_update_request: critical medium error, dev sdr, sector 94390272 sd 0:2:15:0: [sdr] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE sd 0:2:15:0: [sdr] tag#1 Sense Key : Medium Error [current] sd 0:2:15:0: [sdr] tag#1 Add. Sense: Unrecovered read error There may be long inflight IO messages in SDS trc: contDevMngr_HandleLongInflightIoViolation:02998: IO on devId: 2d63f7c80003000e (/dev/sdr) took too long, Low threshold exceeded - waited for reaper 12250 millis contDevMngr_HandleLongInflightIoViolation:02998: IO on devId: 2d63f7c80003000e (/dev/sdr) took too long, Low threshold exceeded - waited for reaper 13250 millis contDevMngr_HandleLongInflightIoViolation:02998: IO on devId: 2d63f7c80003000e (/dev/sdr) took too long, Low threshold exceeded - waited for reaper 14250 millis There may be Errors in the device's I/O counters in SDS' sdbg_out.txt: 13: Dev path:/dev/sdr Size(lbs):0 Time grn:520577464 Io Counters: GENERAL Writes: 4852 Lbs: 2160443 MBs: 1054 Errors: 0 Reads: 49283 Lbs: 111376 MBs: 54 Errors: 12744 BM Writes: 0 Lbs: 0 MBs: 0 Errors: 0 Reads: 0 Lbs: 0 MBs: 0 Errors: 0 COMB_MAP Writes: 5 Lbs: 1390 MBs: 0 Errors: 2 Reads: 0 Lbs: 0 MBs: 0 Errors: 0 TOOTH_MAP Writes: 426 Lbs: 688528 MBs: 336 Errors: 424 Reads: 0 Lbs: 0 MBs: 0 Errors: 0 IO Writes: 4319 Lbs: 603064 MBs: 294 Errors: 16 Reads: 2076 Lbs: 16608 MBs: 8 Errors: 22 The device's latency may be high according to counters_dump.txt: ID: 2d63f7c60003000c DEVICE_TYPE DEV_LATENCY AVG_WRITE_LATENCY_IN_MICROSEC 0 ID: 2d63f7c70003000d DEVICE_TYPE DEV_LATENCY AVG_WRITE_LATENCY_IN_MICROSEC 0 ID: 2d63f7c80003000e DEVICE_TYPE DEV_LATENCY AVG_WRITE_LATENCY_IN_MICROSEC 11424 ID: 2d63f7c90003000f DEVICE_TYPE DEV_LATENCY AVG_WRITE_LATENCY_IN_MICROSEC 0 ID: 2d63f7ca00030010 DEVICE_TYPE DEV_LATENCY AVG_WRITE_LATENCY_IN_MICROSEC 0 Impact The "Fixed Read Errors" counter does not have any direct impact on the system. However, it may indicate an underlying condition that could cause SDS disconnections, rebuild activities, etc.
This can be seen when an SDS device has read errors that have been corrected, or fixed, by using the mirrored copy. The correction can happen in the following cases: The background scanner fails to read from one copy of the data, and uses the other copy to overwrite it. An SDS fails to serve an SDC's read requests due to failure to read the disk, and uses the secondary copy to serve the I/O and overwrite the local data. The warning indicates that the disk may be slowing down, going bad, or having bad blocks. The mechanisms described above re-write the blocks, which can fix "soft" bad blocks.
Examine the disk. If necessary, contact the hardware vendor to replace it. The counter usually indicates an underlying condition, and the disk is breaking. The SDS' action explained above is an attempt to fix soft bad blocks but may not succeed in all scenarios. Clear the error counter by running the following command from the Primary MDM with admin login scli --reset_scanner_error_counters --protection_domain_id <pd id> --storage_pool_id <sp id> --reset_corrected_read_error_counter You may run the scli --query_all to list the protection domain and storage pool IDs
Click on a version to see all relevant bugs
Dell Integration
Learn more about where this data comes from
BugZero Plan
Streamline upgrades with automated vendor bug scrubs
BugZero Prevent
Wish you caught this bug sooner? Get proactive today.