Loading...
Loading...
During a VSAN cluster node expansion task, a production VM became inaccessible for several hours resulting in data unavailable.During the node expansion test, three nodes are removed from maintenance mode after the VxRail expansion task. Nonproduction VMs are migrated to these nodes. Simultaneously, DRS (which was set to Fully Automated) began to move workload.There was also a network configuration issue on this migrated VMs.To resolve the network configuration on the VMs, the nonproduction VMs were migrated back to the existing nodes.During that time, customer observed VMs becoming inaccessible, which caused Data Unavailable (DU).2021-08-23T16:46:03.444+08:00 INFO vsan-mgmt[08861] [VsanHealthSummaryLogUtil::PrintHealthResult opID=noOpId] Cluster VxRail-Virtual-SAN-Cluster Overall Health : red Group data health : red Test objecthealth health : red Overview: Health/Objects ObjectCount (Healthy, 413), (Datamove, 13), (Reduced-Availability-With-No-Rebuild-Delay-Timer, 84),2021-08-23T16:50:23.911+08:00 INFO vsan-mgmt[08861] [VsanHealthSummaryLogUtil::PrintHealthResult opID=noOpId] Cluster VxRail-Virtual-SAN-Cluster Overall Health : red Group data health : red Test objecthealth health : red Overview: Health/Objects ObjectCount (Healthy, 364), (Datamove, 4), (Reduced-Availability-With-Active-Rebuild, 1), (Reduced-Availability-With-No-Rebuild-Delay-Timer, 131), (Inaccessible, 11),2021-08-23T16:53:41.081+08:00 INFO vsan-mgmt[08861] [VsanHealthSummaryLogUtil::PrintHealthResult opID=noOpId] Cluster VxRail-Virtual-SAN-Cluster Overall Health : red Group data health : red Test objecthealth health : red Overview: Health/Objects ObjectCount (Healthy, 318), (Datamove, 2), (Reduced-Availability-With-Active-Rebuild, 3), (Reduced-Availability-With-No-Rebuild-Delay-Timer, 158), (Inaccessible, 29),Test nodes are put into maintenance mode by no data migration mode2021-08-23T07:14:28.848Z: [UserLevelCorrelator] 12121590652us: [esx.audit.maintenancemode.exited] The host has exited maintenance mode.2021-08-23T08:52:28.717Z: [UserLevelCorrelator] 18001459300us: [esx.audit.maintenancemode.entering] The host has begun entering maintenance mode.2021-08-23T08:52:30.346Z: [UserLevelCorrelator] 18003088181us: [esx.audit.maintenancemode.entered] The host has entered maintenance mode.2021-08-23T11:48:19Z bootstop: Host is rebooting2021-08-23T11:52:22.478Z: [UserLevelCorrelator] 28795220410us: [esx.audit.maintenancemode.exited] The host has exited maintenance mode.mode 02021-08-23T08:52:18.681Z info clomd[2167677] [Originator@6876] CLOMWhatIfEntityDecom: Starting decom on entity 611fefdd-3160-e572-eb47-78ac444cf5b0, mode 0, ensureDurability 0 wipeDisk 0, entity type is CdbObjectNode, use static dedupRatio 1.000000, what-if reason 0, dedupScope 0, encryption 0The production inaccessible VM will not come back until all these newly added nodes are taken out of maintenance mode.2021-08-23T19:52:45.831+08:00 INFO vsan-mgmt[08861] [VsanHealthSummaryLogUtil::PrintHealthResult opID=noOpId] Cluster VxRail-Virtual-SAN-ClusterOverall Health : red Group data health : red Test objecthealth health : red Overview: Health/Objects ObjectCount (Healthy, 474), (Datamove, 10), (Reduced-Availability-With-Active-Rebuild, 14), (Reduced-Availability-With-No-Rebuild, 3), (Inaccessible, 11),
This occurred due to the nodes being placed into maintenance mode with "No data migration". The production VMS during the test were migrated to new nodes under RAID 5. When the nodes were placed into maintenance mode to adjust the network configuration of the VMs, the existing cluster nodes lost control over the production VM data blocks.
When using RAID 5 protection mode, use "Ensure accessibility" when placing the node in Maintenance mode. Check that there is no existing VSAN resynchronization data activity and that DRS is not running.If using vSphere 7.x, check Skyline Health and Data and or VSAN Object Health. Do not proceed with any activity if any errors exist.
Click on a version to see all relevant bugs
Dell Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.