...
a vSAN disk group is taken offline with the vmkernel log message similar to the below examples (note that specific dates, times, and IDs will be different for your environment):Example 1:2020-05-21T15:22:38.514Z cpu1:1000341425)WARNING: PLOG: DDPCacheIOCb:686: Trying to format a valid metadata block, UUID 52fcff55-6866-a3d0-d0d5-ba4e3c1d9362, type 4, pbn 43980465156472020-05-21T15:22:38.514Z cpu0:1000214054)WARNING: PLOG: DDPCompleteDDPWrite:6455: Throttled: DDP write failed Invalid metadata callback PLOGDDPWriteCbFn@com.vmware.plog#0.0.0.1, diskgroup 5287714a-e5a0-d986-1f12-e0c960878e53 txnScopeIdx 02020-05-21T15:22:38.514Z cpu0:1000214054)PLOG: DDPCompleteDDPWrite:6469: Throttled: (DDPWrite): Curr: completeTask, Prev: updateHashmap, Status: Success2020-05-21T15:22:38.514Z cpu0:1000214054)WARNING: PLOG: PLOGDDPWriteCbFn:655: DDP write failed on device 52fcff55-6866-a3d0-d0d5-ba4e3c1d9362:Invalid metadata (ssdPerm: no)elevIo 0, doDdpCommit yes2020-05-21T15:22:38.514Z cpu1:1000213133)WARNING: PLOG: PLOGPropagateError:4232: DDP: Propagating error state from original device 52fcff55-6866-a3d0-d0d5-ba4e3c1d93622020-05-21T15:22:38.514Z cpu1:1000213133)WARNING: PLOG: PLOGPropagateError:4284: DDP: Propagating error state to MDs in device 5287714a-e5a0-d986-1f12-e0c960878e532020-05-21T15:22:38.514Z cpu1:1000213133)PLOG: PLOG_FindAndUpdateDevTelemetryStat:1058: Setting devResState : dev: mpx.vmhba0:C0:T4:L0 cState: 0 nState: 6 isLSE: 02020-05-21T15:22:38.514Z cpu1:1000213133)WARNING: PLOG: PLOGPropagateErrorInt:4172: Permanent error event on 52fcff55-6866-a3d0-d0d5-ba4e3c1d93622020-05-21T15:22:38.514Z cpu1:1000213133)PLOG: PLOG_FindAndUpdateDevTelemetryStat:1058: Setting devResState : dev: mpx.vmhba0:C0:T3:L0 cState: 7 nState: 7 isLSE: 02020-05-21T15:22:38.514Z cpu1:1000213133)WARNING: PLOG: PLOGPropagateErrorInt:4188: Error/unhealthy propagate event on 52934040-4111-9e8c-4d12-ad0f5635b3d62020-05-21T15:22:38.514Z cpu1:1000213133)PLOG: PLOG_FindAndUpdateDevTelemetryStat:1058: Setting devResState : dev: mpx.vmhba0:C0:T6:L0 cState: 7 nState: 7 isLSE: 02020-05-21T15:22:38.514Z cpu1:1000213133)WARNING: PLOG: PLOGPropagateErrorInt:4188: Error/unhealthy propagate event on 5287714a-e5a0-d986-1f12-e0c960878e53Example 2:2020-05-21T16:36:22.055Z cpu0:1000341426)WARNING: PLOG: DDPCacheIOCb:686: Trying to format a valid metadata block, UUID 528006c4-3f71-81c4-ae10-0ae7d661bba0, type 3, pbn 32985349043462020-05-21T16:36:22.055Z cpu1:1000214313)WARNING: PLOG: DDPCompleteDDPWrite:6455: Throttled: DDP write failed Invalid metadata callback PLOGDDPWriteCbFn@com.vmware.plog#0.0.0.1, diskgroup 52379c29-607b-e423-f700-dc4386d74c6a txnScopeIdx 02020-05-21T16:36:22.055Z cpu1:1000214313)PLOG: DDPCompleteDDPWrite:6469: Throttled: (DDPWrite): Curr: completeTask, Prev: addNewHash, Status: Success2020-05-21T16:36:22.055Z cpu1:1000214313)WARNING: PLOG: PLOGDDPWriteCbFn:655: DDP write failed on device 528006c4-3f71-81c4-ae10-0ae7d661bba0:Invalid metadata (ssdPerm: no)elevIo 0, doDdpCommit yes2020-05-21T16:36:22.058Z cpu0:1000214307)PLOG: PLOGElevHandleFailure:2325: Waiting till we process failure ... dev 528006c4-3f71-81c4-ae10-0ae7d661bba02020-05-21T16:36:22.061Z cpu0:1000213234)WARNING: PLOG: PLOGPropagateError:4232: DDP: Propagating error state from original device 528006c4-3f71-81c4-ae10-0ae7d661bba02020-05-21T16:36:22.061Z cpu0:1000213234)WARNING: PLOG: PLOGPropagateError:4284: DDP: Propagating error state to MDs in device 52379c29-607b-e423-f700-dc4386d74c6a2020-05-21T16:36:22.061Z cpu0:1000213234)PLOG: PLOG_FindAndUpdateDevTelemetryStat:1058: Setting devResState : dev: mpx.vmhba0:C0:T4:L0 cState: 0 nState: 6 isLSE: 02020-05-21T16:36:22.061Z cpu0:1000213234)WARNING: PLOG: PLOGPropagateErrorInt:4172: Permanent error event on 528006c4-3f71-81c4-ae10-0ae7d661bba02020-05-21T16:36:22.061Z cpu0:1000213234)PLOG: PLOG_FindAndUpdateDevTelemetryStat:1058: Setting devResState : dev: mpx.vmhba0:C0:T3:L0 cState: 7 nState: 7 isLSE: 02020-05-21T16:36:22.061Z cpu0:1000213234)WARNING: PLOG: PLOGPropagateErrorInt:4188: Error/unhealthy propagate event on 52c30f7b-abfb-3bf2-2bb1-6ed690e7d4f32020-05-21T16:36:22.061Z cpu0:1000213234)PLOG: PLOG_FindAndUpdateDevTelemetryStat:1058: Setting devResState : dev: mpx.vmhba0:C0:T6:L0 cState: 7 nState: 7 isLSE: 02020-05-21T16:36:22.061Z cpu0:1000213234)WARNING: PLOG: PLOGPropagateErrorInt:4188: Error/unhealthy propagate event on 52379c29-607b-e423-f700-dc4386d74c6a2020-05-21T16:36:25.915Z cpu0:1000214307)PLOG: PLOGRelogBase:226: RELOG: relogTask exit requested2020-05-21T16:36:25.915Z cpu0:1000214307)PLOG: PLOGRelogExit:605: RELOG task exiting UUID 52379c29-607b-e423-f700-dc4386d74c6a SuccessExample 3:2020-05-21T16:56:00.941Z cpu1:1000341426)WARNING: PLOG: DDPCacheIOCb:686: Trying to format a valid metadata block, UUID 521d473e-2bd4-d796-b250-0587bd83fae9, type 5, pbn 54975581600572020-05-21T16:56:00.941Z cpu0:1000213922)WARNING: PLOG: DDPCompleteDDPWrite:6455: Throttled: DDP write failed Invalid metadata callback PLOGDDPWriteCbFn@com.vmware.plog#0.0.0.1, diskgroup 5247de40-f42b-a0e3-a310-b4e7a2f5cbee txnScopeIdx 02020-05-21T16:56:00.941Z cpu0:1000213922)PLOG: DDPCompleteDDPWrite:6469: Throttled: (DDPWrite): Curr: completeTask, Prev: readXmap, Status: Success2020-05-21T16:56:00.941Z cpu0:1000213922)WARNING: PLOG: PLOGDDPWriteCbFn:655: DDP write failed on device 521d473e-2bd4-d796-b250-0587bd83fae9:Invalid metadata (ssdPerm: no)elevIo 0, doDdpCommit yes2020-05-21T16:56:00.941Z cpu0:1000213916)PLOG: PLOGElevHandleFailure:2325: Waiting till we process failure ... dev 521d473e-2bd4-d796-b250-0587bd83fae92020-05-21T16:56:00.941Z cpu0:1000213152)WARNING: PLOG: PLOGPropagateError:4232: DDP: Propagating error state from original device 521d473e-2bd4-d796-b250-0587bd83fae92020-05-21T16:56:00.941Z cpu0:1000213152)WARNING: PLOG: PLOGPropagateError:4284: DDP: Propagating error state to MDs in device 5247de40-f42b-a0e3-a310-b4e7a2f5cbee2020-05-21T16:56:00.941Z cpu0:1000213152)PLOG: PLOG_FindAndUpdateDevTelemetryStat:1058: Setting devResState : dev: mpx.vmhba0:C0:T4:L0 cState: 0 nState: 6 isLSE: 02020-05-21T16:56:00.943Z cpu0:1000213152)WARNING: PLOG: PLOGPropagateErrorInt:4172: Permanent error event on 521d473e-2bd4-d796-b250-0587bd83fae92020-05-21T16:56:00.943Z cpu0:1000213152)PLOG: PLOG_FindAndUpdateDevTelemetryStat:1058: Setting devResState : dev: mpx.vmhba0:C0:T3:L0 cState: 7 nState: 7 isLSE: 02020-05-21T16:56:00.943Z cpu0:1000213152)WARNING: PLOG: PLOGPropagateErrorInt:4188: Error/unhealthy propagate event on 52cd91e1-e659-8d2c-f431-4e1e923217d02020-05-21T16:56:00.943Z cpu0:1000213152)PLOG: PLOG_FindAndUpdateDevTelemetryStat:1058: Setting devResState : dev: mpx.vmhba0:C0:T6:L0 cState: 7 nState: 7 isLSE: 02020-05-21T16:56:00.944Z cpu0:1000213152)WARNING: PLOG: PLOGPropagateErrorInt:4188: Error/unhealthy propagate event on 5247de40-f42b-a0e3-a310-b4e7a2f5cbee2020-05-21T16:56:04.066Z cpu0:1000213916)PLOG: PLOGRelogBase:226: RELOG: relogTask exit requested
This KB is written to advise that this issue may occur, and direct you to reach out to VMware for assistance with resolving this issue.
The behavior leading to the disk group being taken offline from vSAN use was introduced to avoid potential data corruption issues in a scenario where certain metadata blocks are bad or in an inconsistent state.History:Prior to vSAN 6.7 release vSAN would re-initialize the block as a bitmap block by discarding any previous allocation in this block, and thus potentially allowing random corruption in user data at a later stage. With the vSAN 6.7 release, a PSOD (purple screen panic) was introduced to avoid this corruption potential. Please see KB 80703 for details around the PSOD.Removing the disk group from use was introduced as alternate behavior to avoid the PSOD in vSAN 6.7 p05 and 7.0 Update 1.
If a failures to tolerate of 0 policy is in use, data is in a reduced redundancy state, or multiple events occur before data resync or rebuild can occur, then this could lead to a potential data unavailable or data loss scenario.
Please work with VMware and your hardware vendor to determine the underlying cause of the inconsistent metadata.
Please contact VMware support to work around this issue and restore the disk group to use.