BugZero | VMware BugID 88481 - vCLS VMs are not re-created in a vSAN Cluster foll...

VMware - Defect ID: 88481

vCLS VMs are not re-created in a vSAN Cluster following a complete shutdown of a vSAN cluster

VMware - Defect ID: 88481

vCLS VMs are not re-created in a vSAN Cluster following a complete shutdown of a vSAN cluster

Last updated on 11/14/2022

Overall: 0N/A

Severity: 0N/A

Community: 0N/A

Lifecycle: 0N/A

What is the BugZero Risk Score?

Vendor details

No defect details.

Overall: 0N/A

Severity: 0N/A

Community: 0N/A

Lifecycle: 0N/A

What is the BugZero Risk Score?

Vendor details

No defect details.

Symptoms

vCLS VMs are not re-created in a vSAN Cluster following a complete shutdown of the vSAN cluster. This is more likely due to an improper shutdown of the vSAN Cluster but can occur after a proper shutdown and restart procedure as well. An error message is displayed in vSphere Client, saying: vSphere DRS functionality was impacted due to unhealthy state of vSphere Cluster Services caused by the unavailability of vSphere Cluster Service VMs. vSphere Cluster Service VMs are required to maintain the health of vSphere DRS. When looking in the EAM MOB [https://<vc_fqdn>/eam/mob] for the cluster, the following information can be found:

Purpose

Reset the status of the cluster and enable the automatic creation of vCenter VMs

Cause

When a vSAN Cluster is shutdown (proper or improper), an API call is made to EAM to disable the vCLS Agency on the cluster. In an ideal workflow, when the cluster is back online, the Cluster is marked as enabled again, so that vCLS VMs can be powered on, or new ones can be created, depending on the vCLS slots determined on the cluster.When this workflow goes awry, the cluster is marked as disabled for the vCLS Agency, and none of the automated workflows mark the cluster as enabled again. As a result, no vCLS VMs are created for the cluster, and DRS remains in an non-healthy state.The cluster is marked in a disabled state by an entry created for the cluster in the VCDB, in the table: vpx_ext_data

Impact / Risks

WARNING: This process involves making changes to the vCenter Database.Please take offline snapshots of all vCenters in the SSO before running through the workaround steps.

Resolution

WARNING: Please take offline snapshots of all vCenter Servers in the SSO domain before running through these steps.Incorrect changes to the VCDB can cause a catastrophic failure of the vCenter, which we may not be able to recover from. Login to the vSphere UI and click on the cluster in question. From the URL, record the cluster ID. It should be domain-cxx. In the example above, for the selected cluster, the ID is domain-c132. Ensure that the Retreat Mode Advanced setting for this cluster is set to True as described in https://kb.vmware.com/s/article/80472 Connect to the vCenter Server Appliance managing the cluster per SSH: Connect to the VCDB via the vPostgres shell: # /opt/vmware/vpostgres/current/bin/psql -U postgres -d VCDB Identify the clusters that are marked as disabled: # select * from vpx_ext_data where data_key like '%DisabledClusters%'; The output will look something like this: Delete the entry associated with the cluster ID we are working on, using the surr_key: # delete from vpx_ext_data where surr_key = <surr_key recorded above>; In our example: Leave the vPostgres shell: \q Restart all services on the vCenter to ensure all services are coming back online: # service-control --stop --all && service-control --start --all Once all the services are back online, login to the vSphere UI, and confirm that the vCLS VMs are created for the cluster, and the vSphere Cluster Services status is set to healthy.

Related Information

vSphere Cluster Services (vCLS) in vSphere 7.0 Update 1 and newer versions (80472)https://kb.vmware.com/s/article/80472 Manually Shut Down and Restart the vSAN Clusterhttps://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.vsan-monitoring.doc/GUID-31B4F958-30A9-4BEC-819E-32A18A685688.html

Original Vendor Announcement

No bugs this month

Ready to prevent the next vendor outage?

Get a demo

OPERATIONAL DEFECT DATABASE

VMware - Defect ID: 88481

vCLS VMs are not re-created in a vSAN Cluster following a complete shutdown of a vSAN cluster

VMware - Defect ID: 88481

vCLS VMs are not re-created in a vSAN Cluster following a complete shutdown of a vSAN cluster

Last updated on 11/14/2022

Vendor details

Vendor details

Description

Symptoms

Purpose

Cause

Impact / Risks

Resolution

Related Information

Links

Top VMware defects by risk score

Ready to prevent the next vendor outage?