Symptoms
When performing Non-disruptive migration (NDM) with ESXi Pass-through Raw Device Mapping (pRDM) Microsoft Clustering environment in Cluster-Across-Boxes (CAB) configuration, the host/multipath assumes that re-discovered V3 paths are still registered from before the cancel. In the Microsoft cluster environment, this results in the symptom NTFS event 57 and unexpected cluster resource failover. A second issue is where NDM changes are introduced before all participating hosts have had a chance to recognize the first change (cutover). This may result in cases where a write can be rejected by both V2 and V3 and devices get marked with Permanent Device Loss (PDL) by VMware.
Cause
This issue occurs due to a possible exposure downstream of repeated add/removal of paths without corresponding host rescan/clean up after each step.
Resolution
Clean up stale/dead host paths by performing host rescan operation after a Cancel issued, and that all participating hosts have realized the cutover change before running a cancel-revert.