...
SSD or SSDr (read cache) disks can show a high load on the system indicating an issue with the hardware. This issue is detected in release 4.8-92.0 of xDoctor.xDoctor reports RAP162: ------------------------------------------ ERROR - System disk has low remaining life ------------------------------------------ Node = Nodes Extra = {"Nodes": {"169.254.1.13": {"BTWM5AM000UB": {"used_life": "255"}}, "169.254.1.14": {"BTWM59N0079B": {"used_life": "255"}}, "169.254.1.15": {"BTWM59N002PB": {"used_life": "255"}}, "169.254.1.16": {"BTWM59N0025B": {"used_life": "255"}}}} RAP = RAP162 Solution = KB 215459 Timestamp = 2023-06-30_132850 PSNT = Rome @ 4.8-92.0 ---------------------------------------- ERROR - SSDR disk has low remaining life ---------------------------------------- Node = Nodes Extra = {"Nodes": {"169.254.1.13": {"BTWM5AM000UA": {"used_life": "255"}}, "169.254.1.14": {"BTWM59N0077B": {"used_life": "255"}}, "169.254.1.15": {"BTWM59N002AB": {"used_life": "255"}}, "169.254.1.16": {"BTWM59N0025C": {"used_life": "255"}}}} RAP = RAP162 Solution = KB 215459 Timestamp = 2023-06-30_132850 PSNT = Rome @ 4.8-92.0 Detected high disk utilization on the system by checking SAR data that collects information every 10 minutes to determine if the system is persisting a disk performance issue with high await in the SAR statistics. Another check can validate SAR data for Operating System SATA SSD and or SATA SSDr Read cache disk performance: Operating System SATA SSD:Command: (Operating System SATA SSD Individual node) # ssd=$(cs_hal list --all disks | grep 'intl/sys'|awk '{print $2}'|sed 's/.*[/:]//');sar -d -p --dev=$ssd Command: (Operating System SATA SSD Cluster) # svc_exec "ssd=\$(cs_hal list --all disks | grep 'intl/sys'|awk '{print \$2}'|sed 's/.*[/:]//');sar -d -p --dev=\$ssd" SSDr Read cache disk:Command: (SATA SSDr Read cache disk Individual node) # ssdr=$(sudo -i fcli agent disk.disks --pretty-print | grep "READ_CACHE" | awk '{print $2}');sar -d -p --dev=$ssdr Command: (SATA SSDr Read cache disk Cluster) # svc_exec "ssdr=\$(sudo -i fcli agent disk.disks --pretty-print | grep "READ_CACHE" | awk '{print \$2}');sar -d -p --dev=\$ssdr" Example: (Confirm await times are over 100 for the last three SAR checks) [...Output Truncated...] DEV tps rkB/s wkB/s areq-sz aqu-sz await svctm %util 12:10:01 sdad 3.23 69.58 130.87 62.14 29.78 9503.41 224.33 72.36 12:10:01 DEV tps rkB/s wkB/s areq-sz aqu-sz await svctm %util 12:20:01 sdad 2.24 35.28 18.28 23.95 67.97 29994.40 371.69 83.11 12:30:01 sdad 2.72 76.23 91.17 61.48 17.16 6813.32 102.38 27.88 [...Output Truncated...]
SSD and SSDr have a usable endurance life that how long the disk can function until failed. These key checks must be done to determine if a failure is concerning, and a replacement is warranted. When the life of the Operating System SATA SSD and or SATA SSDr Read cache disk reaches 85% used or 15% remaining, a Proactive replacement is recommended.When the life of the Operating System SATA SSD and or SATA SSDr Read cache disk 95% used or 5% remaining life a Reactive replacement is recommended Operating System SATA SSD: Check SSD disk failures using the following which checks all system disks for VDC and can be checked individually. Different SSD models produce outputs differently. Command: # svc_exec "ssd=\$(cs_hal list --all disks | grep 'intl/sys' |awk '{print \$2}');sudo /usr/sbin/smartctl -l devstat \$ssd | grep Endurance;sudo /usr/sbin/smartctl -a \$ssd | grep -w 245" Example 1: (Percentage Used Endurance Indicator and Percent Life Remaining) admin@node1:~> svc_exec "ssd=\$(cs_hal list --all disks | grep 'intl/sys' |awk '{print \$2}');sudo /usr/sbin/smartctl -l devstat \$ssd | grep Endurance;sudo /usr/sbin/smartctl -a \$ssd | grep -w 245" svc_exec v1.0.6 (svc_tools v2.12.2) Started 2023-06-30 13:47:17 Output from node: r1n1 retval: 0 0x07 0x008 1 90 --- Percentage Used Endurance Indicator 245 Percent_Life_Remaining 0x0032 064 064 000 Old_age Always - 10 ...[Output Truncated]... Example 2: (Percentage Used Endurance Indicator and Unknown_Attribute) admin@node1:~> svc_exec "ssd=\$(cs_hal list --all disks | grep 'intl/sys' |awk '{print \$2}');sudo /usr/sbin/smartctl -l devstat \$ssd | grep Endurance;sudo /usr/sbin/smartctl -a \$ssd | grep -w 245" svc_exec v1.0.6 (svc_tools v2.12.2) Started 2023-06-30 13:47:17 Output from node: r1n1 retval: 0 0x07 0x008 1 85 --- Percentage Used Endurance Indicator 245 Unknown_Attribute 0x0032 064 064 000 Old_age Always - 15 ...[Output Truncated]... Example 3: (Percentage Life Remaining) admin@node1:~> svc_exec "ssd=\$(cs_hal list --all disks | grep 'intl/sys' |awk '{print \$2}');sudo /usr/sbin/smartctl -l devstat \$ssd | grep Endurance;sudo /usr/sbin/smartctl -a \$ssd | grep -w 245" svc_exec v1.0.6 (svc_tools v2.12.1) Started 2023-06-30 13:53:41 Output from node: r1n1 retval: 0 245 Percent_Life_Remaining 0x0032 082 082 000 Old_age Always - 5 Remaining. ...[Output Truncated]... Example 4: (Percentage Used Endurance Indicator) admin@node1:~> svc_exec "ssd=\$(cs_hal list --all disks | grep 'intl/sys' |awk '{print \$2}');sudo /usr/sbin/smartctl -l devstat \$ssd | grep Endurance;sudo /usr/sbin/smartctl -a \$ssd | grep -w 245" svc_exec v1.0.6 (svc_tools v2.12.2) Started 2023-06-30 14:02:03 Output from node: r1n1 retval: 1 0x07 0x008 1 95 N-- Percentage Used Endurance Indicator Remaining. ...[Output Truncated]... Example 5: (Unknown_Attribute) admin@node1:~> svc_exec "ssd=\$(cs_hal list --all disks | grep 'intl/sys' |awk '{print \$2}');sudo /usr/sbin/smartctl -l devstat \$ssd | grep Endurance;sudo /usr/sbin/smartctl -a \$ssd | grep -w 245" svc_exec v1.0.6 (svc_tools v2.12.1) Started 2023-06-30 13:53:41 Output from node: r1n1 retval: 0 245 Unknown_Attribute 0x0032 082 082 000 Old_age Always - 10 ...[Output Truncated]... Individual node check to investigate disks on a node basis Command: # ssd=$(cs_hal list --all disks | grep 'intl/sys' |awk '{print $2}');sudo /usr/sbin/smartctl -l devstat $ssd | grep Endurance;sudo /usr/sbin/smartctl -a $ssd | grep -e 245 Example: Reference the five examples in step for the correct endurance percentage on the node. SATA SSDr Read cache disk: To check for SSDr Read cache disk failures the following checks all system disks for VDC and can be checked individually. Different SSDr models produce outputs differently. Command: # svc_exec "ssdr=\$(sudo -i fcli agent disk.disks --pretty-print | grep "READ_CACHE" | awk '{print \$2}');sudo /usr/sbin/smartctl -l devstat \$ssdr | grep Endurance;sudo /usr/sbin/smartctl -a \$ssdr | grep -w 245" Example 1: (Percentage Used Endurance Indicator and Percent Life Remaining) admin@node1:~> svc_exec "ssdr=\$(sudo -i fcli agent disk.disks --pretty-print | grep "READ_CACHE" | awk '{print \$2}');sudo /usr/sbin/smartctl -l devstat \$ssdr | grep Endurance;sudo /usr/sbin/smartctl -a \$ssdr | grep -w 245" svc_exec v1.0.6 (svc_tools v2.12.2) Started 2023-06-30 13:47:17 Output from node: r1n1 retval: 0 0x07 0x008 1 95 --- Percentage Used Endurance Indicator 245 Percent_Life_Remaining 0x0032 064 064 000 Old_age Always - 5 ...[Output Truncated]... Example 2: (Percentage Used Endurance Indicator and Unknown_Attribute) admin@node1:~> svc_exec "ssdr=\$(sudo -i fcli agent disk.disks --pretty-print | grep "READ_CACHE" | awk '{print \$2}');sudo /usr/sbin/smartctl -l devstat \$ssdr | grep Endurance;sudo /usr/sbin/smartctl -a \$ssdr | grep -w 245" svc_exec v1.0.6 (svc_tools v2.12.2) Started 2023-06-30 13:47:17 Output from node: r1n1 retval: 0 0x07 0x008 1 94 --- Percentage Used Endurance Indicator 245 Unknown_Attribute 0x0032 064 064 000 Old_age Always - 6 ...[Output Truncated]... Example 3: (Percentage Life Remaining) admin@node1:~> svc_exec "ssdr=\$(sudo -i fcli agent disk.disks --pretty-print | grep "READ_CACHE" | awk '{print \$2}');sudo /usr/sbin/smartctl -l devstat \$ssdr | grep Endurance;sudo /usr/sbin/smartctl -a \$ssdr | grep -w 245" svc_exec v1.0.6 (svc_tools v2.12.1) Started 2023-06-30 13:53:41 Output from node: r1n1 retval: 0 245 Percent_Life_Remaining 0x0032 082 082 000 Old_age Always - 15 Remaining. ...[Output Truncated]... Example 4: (Percentage Used Endurance Indicator) admin@node1:~> svc_exec "ssdr=\$(sudo -i fcli agent disk.disks --pretty-print | grep "READ_CACHE" | awk '{print \$2}');sudo /usr/sbin/smartctl -l devstat \$ssdr | grep Endurance;sudo /usr/sbin/smartctl -a \$ssdr | grep -w 245" svc_exec v1.0.6 (svc_tools v2.12.2) Started 2023-06-30 14:02:03 Output from node: r1n1 retval: 1 0x07 0x008 1 80 N-- Percentage Used Endurance Indicator Remaining. ...[Output Truncated]... Example 5: (Unknown_Attribute) admin@node1:~> svc_exec "ssdr=\$(sudo -i fcli agent disk.disks --pretty-print | grep "READ_CACHE" | awk '{print \$2}');sudo /usr/sbin/smartctl -l devstat \$ssdr | grep Endurance;sudo /usr/sbin/smartctl -a \$ssdr | grep -w 245" svc_exec v1.0.6 (svc_tools v2.12.1) Started 2023-06-30 13:53:41 Output from node: r1n1 retval: 0 245 Unknown_Attribute 0x0032 082 082 000 Old_age Always - 10 ...[Output Truncated]... Individual node check to investigate disks on a node basis. Command: # ssdr=$(sudo -i fcli agent disk.disks --pretty-print | grep "READ_CACHE" | awk '{print $2}');sudo /usr/sbin/smartctl -l devstat $ssdr | grep Endurance;sudo /usr/sbin/smartctl -a $ssdr | grep -w 245 Example: Reference the five examples in step for the correct endurance percentage on the node.
Collect the outputs from above then open a service request referencing KB 215459 for SATA SSD and or SATA SSDr device health review and replacement.
Click on a version to see all relevant bugs
Dell Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.