BugZero | VMware BugID 89448 - vSAN LSOM Elevator stopped causing high SSD/Log Co...

VMware - Defect ID: 89448

vSAN LSOM Elevator stopped causing high SSD/Log Congestion

VMware - Defect ID: 89448

vSAN LSOM Elevator stopped causing high SSD/Log Congestion

Last updated on 7/21/2023

Overall: 0N/A

Severity: 0N/A

Community: 0N/A

Lifecycle: 0N/A

What is the BugZero Risk Score?

Vendor details

No defect details.

Overall: 0N/A

Severity: 0N/A

Community: 0N/A

Lifecycle: 0N/A

What is the BugZero Risk Score?

Vendor details

No defect details.

Symptoms

Following symptoms can be seen: Running ESXi 7.0 Update 1 or later'SSD Congestion' alarms in Skyline Health point to one or several DiskGroups in the clusterIncreasing 'ssdCongestion' / 'logCongestion' values when running GSS congestion check one liner: Example: # for ssd in $(localcli vsan storage list |grep "Group UUID"|awk '{print $5}'|sort -u);do echo $ssd;vsish -e get /vmkModules/lsom/disks/$ssd/info|grep Congestion;done Tue Jul 20 19:56:18 UTC 20215218efcf-206f-800d-00a3-b945fe425409 memCongestion:0 slabCongestion:0 ssdCongestion:227 <------- This is already too high, seeing a value <100 but incrementing is already enough to suspect. iopsCongestion:0 logCongestion:0 <------- In some cases logCongestion has increased and no ssdCongestion is present. compCongestion:0 mdCongestion:0 memCongestionLocalMax:0 slabCongestionLocalMax:0 ssdCongestionLocalMax:227 iopsCongestionLocalMax:0 logCongestionLocalMax:0 compCongestionLocalMax:0 mdCongestionLocalMax:0 Following the DiskGroup's host in question, if you go to 'Host → Monitor → vSAN → Performance → Disks → Diskgroup → ', the "Write Buffer Free Percentage" is <70% and there is no throughput showing up at the "Cache Disk De-stage Rate" metric

Purpose

To provide guidance on addressing a known issue

Cause

Due to an underflow of the outstanding IO counter, vSAN elevator thinks that the capacity device already has outstanding IO to be de-staged and waits for that to complete before it can de-stage the next data. However, there are no pending IOs to complete with the capacity disk. Hence, we end up with no data being de-staged by the elevator.

Impact / Risks

Overall vSAN performance could be impacted if PLOG consumption buildup has already caused vSAN congestionVMs may start presenting different problems such as: Increased latency Switching to a "Read-Only" mode Guest OS getting stuck

Resolution

Fixed in vSAN 7.0 U3g (EP5), please update to this build or newer to address the issue.

Original Vendor Announcement

No bugs this month

Ready to prevent the next vendor outage?

Get a demo

OPERATIONAL DEFECT DATABASE

VMware - Defect ID: 89448

vSAN LSOM Elevator stopped causing high SSD/Log Congestion

VMware - Defect ID: 89448

vSAN LSOM Elevator stopped causing high SSD/Log Congestion

Last updated on 7/21/2023

Vendor details

Vendor details

Description

Symptoms

Purpose

Cause

Impact / Risks

Resolution

Links

Top VMware defects by risk score

Ready to prevent the next vendor outage?