BugZero | Dell BugID 215459 - ECS: xDoctor RAP162: Low SSD System Disk or SSDr D...

Dell - Defect ID: 215459

ECS: xDoctor RAP162: Low SSD System Disk or SSDr Disk Life Remaining

Dell - Defect ID: 215459

ECS: xDoctor RAP162: Low SSD System Disk or SSDr Disk Life Remaining

Last updated on October 11th, 2024

BugZero Risk Score
0.0 Coming soon

Overall: N/A

Severity: N/A

Community: N/A

Lifecycle: N/A

What is the BugZero Risk Score?

Dell Integration

Learn more about where this data comes from

Dell Integration

Learn more

Bug Scrub Advisor

Streamline upgrades with automated vendor bug scrubs

Bug Scrub Advisor

Learn more

BugZero Enterprise

Wish you caught this bug sooner? Get proactive today.

BugZero Enterprise

Learn more

Bug Details

Support Case Count: 226
Article View Count: 2538
Impact Category:

Description

Symptoms

SSD or SSDr (read cache) disks can show a high load on the system indicating an issue with the hardware. This issue is detected in release 4.8-92.0 of xDoctor.xDoctor reports RAP162: ------------------------------------------ ERROR - System disk has low remaining life ------------------------------------------ Node = Nodes Extra = {"Nodes": {"169.254.1.13": {"BTWM5AM000UB": {"used_life": "255"}}, "169.254.1.14": {"BTWM59N0079B": {"used_life": "255"}}, "169.254.1.15": {"BTWM59N002PB": {"used_life": "255"}}, "169.254.1.16": {"BTWM59N0025B": {"used_life": "255"}}}} RAP = RAP162 Solution = KB 215459 Timestamp = 2023-06-30_132850 PSNT = Rome @ 4.8-92.0 ---------------------------------------- ERROR - SSDR disk has low remaining life ---------------------------------------- Node = Nodes Extra = {"Nodes": {"169.254.1.13": {"BTWM5AM000UA": {"used_life": "255"}}, "169.254.1.14": {"BTWM59N0077B": {"used_life": "255"}}, "169.254.1.15": {"BTWM59N002AB": {"used_life": "255"}}, "169.254.1.16": {"BTWM59N0025C": {"used_life": "255"}}}} RAP = RAP162 Solution = KB 215459 Timestamp = 2023-06-30_132850 PSNT = Rome @ 4.8-92.0 Detected high disk utilization on the system by checking SAR data that collects information every 10 minutes to determine if the system is persisting a disk performance issue with high await in the SAR statistics. Another check can validate SAR data for Operating System SATA SSD and or SATA SSDr Read cache disk performance: Operating System SATA SSD:Command: (Operating System SATA SSD Individual node) # ssd=$(cs_hal list --all disks | grep 'intl/sys'|awk '{print $2}'|sed 's/.*[/:]//');sar -d -p --dev=$ssd Command: (Operating System SATA SSD Cluster) # svc_exec "ssd=\$(cs_hal list --all disks | grep 'intl/sys'|awk '{print \$2}'|sed 's/.*[/:]//');sar -d -p --dev=\$ssd" SSDr Read cache disk:Command: (SATA SSDr Read cache disk Individual node) # ssdr=$(sudo -i fcli agent disk.disks --pretty-print | grep "READ_CACHE" | awk '{print $2}');sar -d -p --dev=$ssdr Command: (SATA SSDr Read cache disk Cluster) # svc_exec "ssdr=\$(sudo -i fcli agent disk.disks --pretty-print | grep "READ_CACHE" | awk '{print \$2}');sar -d -p --dev=\$ssdr" Example: (Confirm await times are over 100 for the last three SAR checks) [...Output Truncated...] DEV tps rkB/s wkB/s areq-sz aqu-sz await svctm %util 12:10:01 sdad 3.23 69.58 130.87 62.14 29.78 9503.41 224.33 72.36 12:10:01 DEV tps rkB/s wkB/s areq-sz aqu-sz await svctm %util 12:20:01 sdad 2.24 35.28 18.28 23.95 67.97 29994.40 371.69 83.11 12:30:01 sdad 2.72 76.23 91.17 61.48 17.16 6813.32 102.38 27.88 [...Output Truncated...]

Cause

SSD and SSDr have a usable endurance life that how long the disk can function until failed. These key checks must be done to determine if a failure is concerning, and a replacement is warranted. When the life of the Operating System SATA SSD and or SATA SSDr Read cache disk reaches 85% used or 15% remaining, a Proactive replacement is recommended.When the life of the Operating System SATA SSD and or SATA SSDr Read cache disk 95% used or 5% remaining life a Reactive replacement is recommended Operating System SATA SSD: Check SSD disk failures using the following which checks all system disks for VDC and can be checked individually. Different SSD models produce outputs differently. Command: # svc_exec "ssd=\$(cs_hal list --all disks | grep 'intl/sys' |awk '{print \$2}');sudo /usr/sbin/smartctl -l devstat \$ssd | grep Endurance;sudo /usr/sbin/smartctl -a \$ssd | grep -w 245" Example 1: (Percentage Used Endurance Indicator and Percent Life Remaining) admin@node1:~> svc_exec "ssd=\$(cs_hal list --all disks | grep 'intl/sys' |awk '{print \$2}');sudo /usr/sbin/smartctl -l devstat \$ssd | grep Endurance;sudo /usr/sbin/smartctl -a \$ssd | grep -w 245" svc_exec v1.0.6 (svc_tools v2.12.2) Started 2023-06-30 13:47:17 Output from node: r1n1 retval: 0 0x07 0x008 1 90 --- Percentage Used Endurance Indicator 245 Percent_Life_Remaining 0x0032 064 064 000 Old_age Always - 10 ...[Output Truncated]... Example 2: (Percentage Used Endurance Indicator and Unknown_Attribute) admin@node1:~> svc_exec "ssd=\$(cs_hal list --all disks | grep 'intl/sys' |awk '{print \$2}');sudo /usr/sbin/smartctl -l devstat \$ssd | grep Endurance;sudo /usr/sbin/smartctl -a \$ssd | grep -w 245" svc_exec v1.0.6 (svc_tools v2.12.2) Started 2023-06-30 13:47:17 Output from node: r1n1 retval: 0 0x07 0x008 1 85 --- Percentage Used Endurance Indicator 245 Unknown_Attribute 0x0032 064 064 000 Old_age Always - 15 ...[Output Truncated]... Example 3: (Percentage Life Remaining) admin@node1:~> svc_exec "ssd=\$(cs_hal list --all disks | grep 'intl/sys' |awk '{print \$2}');sudo /usr/sbin/smartctl -l devstat \$ssd | grep Endurance;sudo /usr/sbin/smartctl -a \$ssd | grep -w 245" svc_exec v1.0.6 (svc_tools v2.12.1) Started 2023-06-30 13:53:41 Output from node: r1n1 retval: 0 245 Percent_Life_Remaining 0x0032 082 082 000 Old_age Always - 5 Remaining. ...[Output Truncated]... Example 4: (Percentage Used Endurance Indicator) admin@node1:~> svc_exec "ssd=\$(cs_hal list --all disks | grep 'intl/sys' |awk '{print \$2}');sudo /usr/sbin/smartctl -l devstat \$ssd | grep Endurance;sudo /usr/sbin/smartctl -a \$ssd | grep -w 245" svc_exec v1.0.6 (svc_tools v2.12.2) Started 2023-06-30 14:02:03 Output from node: r1n1 retval: 1 0x07 0x008 1 95 N-- Percentage Used Endurance Indicator Remaining. ...[Output Truncated]... Example 5: (Unknown_Attribute) admin@node1:~> svc_exec "ssd=\$(cs_hal list --all disks | grep 'intl/sys' |awk '{print \$2}');sudo /usr/sbin/smartctl -l devstat \$ssd | grep Endurance;sudo /usr/sbin/smartctl -a \$ssd | grep -w 245" svc_exec v1.0.6 (svc_tools v2.12.1) Started 2023-06-30 13:53:41 Output from node: r1n1 retval: 0 245 Unknown_Attribute 0x0032 082 082 000 Old_age Always - 10 ...[Output Truncated]... Individual node check to investigate disks on a node basis Command: # ssd=$(cs_hal list --all disks | grep 'intl/sys' |awk '{print $2}');sudo /usr/sbin/smartctl -l devstat $ssd | grep Endurance;sudo /usr/sbin/smartctl -a $ssd | grep -e 245 Example: Reference the five examples in step for the correct endurance percentage on the node. SATA SSDr Read cache disk: To check for SSDr Read cache disk failures the following checks all system disks for VDC and can be checked individually. Different SSDr models produce outputs differently. Command: # svc_exec "ssdr=\$(sudo -i fcli agent disk.disks --pretty-print | grep "READ_CACHE" | awk '{print \$2}');sudo /usr/sbin/smartctl -l devstat \$ssdr | grep Endurance;sudo /usr/sbin/smartctl -a \$ssdr | grep -w 245" Example 1: (Percentage Used Endurance Indicator and Percent Life Remaining) admin@node1:~> svc_exec "ssdr=\$(sudo -i fcli agent disk.disks --pretty-print | grep "READ_CACHE" | awk '{print \$2}');sudo /usr/sbin/smartctl -l devstat \$ssdr | grep Endurance;sudo /usr/sbin/smartctl -a \$ssdr | grep -w 245" svc_exec v1.0.6 (svc_tools v2.12.2) Started 2023-06-30 13:47:17 Output from node: r1n1 retval: 0 0x07 0x008 1 95 --- Percentage Used Endurance Indicator 245 Percent_Life_Remaining 0x0032 064 064 000 Old_age Always - 5 ...[Output Truncated]... Example 2: (Percentage Used Endurance Indicator and Unknown_Attribute) admin@node1:~> svc_exec "ssdr=\$(sudo -i fcli agent disk.disks --pretty-print | grep "READ_CACHE" | awk '{print \$2}');sudo /usr/sbin/smartctl -l devstat \$ssdr | grep Endurance;sudo /usr/sbin/smartctl -a \$ssdr | grep -w 245" svc_exec v1.0.6 (svc_tools v2.12.2) Started 2023-06-30 13:47:17 Output from node: r1n1 retval: 0 0x07 0x008 1 94 --- Percentage Used Endurance Indicator 245 Unknown_Attribute 0x0032 064 064 000 Old_age Always - 6 ...[Output Truncated]... Example 3: (Percentage Life Remaining) admin@node1:~> svc_exec "ssdr=\$(sudo -i fcli agent disk.disks --pretty-print | grep "READ_CACHE" | awk '{print \$2}');sudo /usr/sbin/smartctl -l devstat \$ssdr | grep Endurance;sudo /usr/sbin/smartctl -a \$ssdr | grep -w 245" svc_exec v1.0.6 (svc_tools v2.12.1) Started 2023-06-30 13:53:41 Output from node: r1n1 retval: 0 245 Percent_Life_Remaining 0x0032 082 082 000 Old_age Always - 15 Remaining. ...[Output Truncated]... Example 4: (Percentage Used Endurance Indicator) admin@node1:~> svc_exec "ssdr=\$(sudo -i fcli agent disk.disks --pretty-print | grep "READ_CACHE" | awk '{print \$2}');sudo /usr/sbin/smartctl -l devstat \$ssdr | grep Endurance;sudo /usr/sbin/smartctl -a \$ssdr | grep -w 245" svc_exec v1.0.6 (svc_tools v2.12.2) Started 2023-06-30 14:02:03 Output from node: r1n1 retval: 1 0x07 0x008 1 80 N-- Percentage Used Endurance Indicator Remaining. ...[Output Truncated]... Example 5: (Unknown_Attribute) admin@node1:~> svc_exec "ssdr=\$(sudo -i fcli agent disk.disks --pretty-print | grep "READ_CACHE" | awk '{print \$2}');sudo /usr/sbin/smartctl -l devstat \$ssdr | grep Endurance;sudo /usr/sbin/smartctl -a \$ssdr | grep -w 245" svc_exec v1.0.6 (svc_tools v2.12.1) Started 2023-06-30 13:53:41 Output from node: r1n1 retval: 0 245 Unknown_Attribute 0x0032 082 082 000 Old_age Always - 10 ...[Output Truncated]... Individual node check to investigate disks on a node basis. Command: # ssdr=$(sudo -i fcli agent disk.disks --pretty-print | grep "READ_CACHE" | awk '{print $2}');sudo /usr/sbin/smartctl -l devstat $ssdr | grep Endurance;sudo /usr/sbin/smartctl -a $ssdr | grep -w 245 Example: Reference the five examples in step for the correct endurance percentage on the node.

Resolution

Collect the outputs from above then open a service request referencing KB 215459 for SATA SSD and or SATA SSDr device health review and replacement.

Support Cases

Change history

2025-03-21 Added: 6

Top Dell Defects by Risk Score

No bugs this month

Dell Integration

Learn more about where this data comes from

Dell Integration

Learn more

Bug Scrub Advisor

Streamline upgrades with automated vendor bug scrubs

Bug Scrub Advisor

Learn more

BugZero Enterprise

Wish you caught this bug sooner? Get proactive today.

BugZero Enterprise

Learn more

Ready to prevent the next vendor outage?

Get a demo

OPERATIONAL DEFECT DATABASE

Dell - Defect ID: 215459

ECS: xDoctor RAP162: Low SSD System Disk or SSDr Disk Life Remaining

Dell - Defect ID: 215459

ECS: xDoctor RAP162: Low SSD System Disk or SSDr Disk Life Remaining

Last updated on October 11th, 2024

BugZero Risk Score
0.0 Coming soon

Bug Details

Symptoms

Cause

Resolution

Support Cases

Links

Top Dell Defects by Risk Score

Ready to prevent the next vendor outage?

OPERATIONAL DEFECT DATABASE

Dell - Defect ID: 215459

ECS: xDoctor RAP162: Low SSD System Disk or SSDr Disk Life Remaining

Dell - Defect ID: 215459

ECS: xDoctor RAP162: Low SSD System Disk or SSDr Disk Life Remaining

Last updated on October 11th, 2024

BugZero Risk Score0.0 Coming soon

Bug Details

Symptoms

Cause

Resolution

Support Cases

Links

Top Dell Defects by Risk Score

Ready to prevent the next vendor outage?

BugZero Risk Score
0.0 Coming soon