...
Issue:VxRail upgrade failed at VSAN on-disk format upgrade.In vCenter web client, it can be observed that VSAN on-disk format upgrade is stuck at 10%. Analysis:If you go to the RVC and check, the disk status you find that it has been upgraded on disk level as following: Run vsan.disks_stats ~/computers/cluster-name +----------------------+--------------------------------+----------+------+------------+---------+----------+-------------+----------+----------+-------------+---------+----------+---------+ | | | | Num | Capacity | | | Physical | Physical | Physical | Logical | Logical | Logical | Status | | DisplayName | Host | DiskTier | Comp | Total | Used | Reserved | Capacity | Used | Reserved | Capacity | Used | Reserved | Health | +----------------------+--------------------------------+----------+------+------------+---------+----------+-------------+----------+----------+-------------+---------+----------+---------+ | naa.50000397cc8ac2d5 | kc-rail-holodeck-01.it.unr.edu | Cache | 0 | 372.61 GB | 0.00 % | 0.00 % | N/A | N/A | N/A | N/A | N/A | N/A | OK (v5) | | naa.5002538a17332f80 | kc-rail-holodeck-01.it.unr.edu | Capacity | 122 | 3387.72 GB | 33.28 % | 22.12 % | 10163.17 GB | 33.32 % | 17.74 % | 16383.99 GB | 10.97 % | 4.57 % | OK (v5) | | naa.5002538a17334b20 | kc-rail-holodeck-01.it.unr.edu | Capacity | 122 | 3387.72 GB | 33.28 % | 14.45 % | 10163.17 GB | 33.32 % | 17.74 % | 16383.99 GB | 7.54 % | 2.99 % | OK (v5) | | naa.5002538a173337f0 | kc-rail-holodeck-01.it.unr.edu | Capacity | 122 | 3387.72 GB | 33.28 % | 16.65 % | 10163.17 GB | 33.32 % | 17.74 % | 16383.99 GB | 11.12 % | 3.44 % | OK (v5) | +----------------------+--------------------------------+----------+------+------------+---------+----------+-------------+----------+----------+-------------+---------+----------+---------+ But the object level is not upgraded as following: Run vsan.obj_status_report ~/computers/cluster-name +-------------------------------------+------------------------------+ | Num Healthy Comps / Total Num Comps | Num objects with such status | +-------------------------------------+------------------------------+ +-------------------------------------+------------------------------+ Total orphans: 0 Total v1 objects: 0 Total v2 objects: 0 Total v2.5 objects: 0 Total v3 objects: 724 Total v5 objects: 0 Total v6 objects: 0 Total v7 objects: 0 If you re-tried the VxRail upgrade at this moment, the upgrade will be succeeded since the disk level has been updated, the VSAN on-disk format upgrade will be skipped at the next try.But the object level is not upgraded. And following log message should be found in /storage/log/vmware/vsan-health/vmware-vsan-health-service.log on VCSA. File "/usr/lib/vmware-vpx/vsan-health/pyMoVsan/VsanPyVmomiProfiler.py", line 152, in InvokeMethod return self._stub.InvokeMethod(mo, info, args) File "/usr/lib/vmware/site-packages/pyVmomi/SoapAdapter.py", line 1668, in InvokeMethod raise obj pyVmomi.VmomiSupport.vim.fault.VsanFault: (vim.fault.VsanFault) { dynamicType = , dynamicProperty = (vmodl.DynamicProperty) [], msg = '', faultCause = , faultMessage = (vmodl.LocalizableMessage) [ (vmodl.LocalizableMessage) { dynamicType = , dynamicProperty = (vmodl.DynamicProperty) [], key = 'com.vmware.vsan.diskconversion.msg.bumpupversionerror', arg = (vmodl.KeyAnyValue) [], message = 'Failed to bump up format version for diskmapping naa.50000397cc8aa6dd, Failed to get VsanInfo operation lock for diskOpLockan operation is currently in progress(locked pid: 0), error: /tmp/.vsanDiskOpLock.lock.LOCK: timout waiting for lock after 30 seconds. Lock is currently held by process 438633 (python: python /usr/lib/vmware/vsan/perfsvc/vsanperfsvc.pyc)' }
In 6.7 vSAN VC service, we upgrade vSAN disk format version per disk group in a parallel way.VC is aware of V6, but hosts are only aware of V5 as they are ESXi 6.5. When a vSAN disk group is being upgraded, the disk operation lock must be held, and if this lock cannot be acquired within 30s, the upgrade request fails.So if a vSAN host has more than one disk group to be upgraded, there must be conflict on acquiring this lock, and if a disk group upgrade takes more than 30s, other disk group requests on this host fail.Risks or Impact: Full features of V5 will not be available, but the object is on V3 and there is no potential risk to data.
This is a known issue in VC 6.5 and which will be fixed in 6.7U2. A solution exists for this issue, but intervention from EMC technical support personnel is required. Because this is a bug in vCenter 6.5, EMC support may need to engage VMware.Contact the EMC Customer Support Center or your service representative for technical support and quote this solution ID.