...
The issue is triggered by a storage access event that causes IOs to be aborted on protected VMs running RecoverPoint for VMs 5.2.2.2 or 5.2.2.3. Production VMs may hang/freeze or become unresponsive after splitter encounters IO errors from the VSCSI layer.This may result in data unavailability until ESX is rebooted. 1. IO error while submitting the IO to the lower layer. Due to the fix for PSOD after IO Submission failure even valid attempts at clean up of failed IO will be marked as an incorrect callback and are not handled further. ESXi host VMkernel logs will show outputs similar to: 2020/06/11 22:41:21.340 - #2 - 2103327/2103284 - KS: krnl:[22:41:20.917] 0/0 #0 - IoEsx_ToStorage_s_forwardToLower: VSCSIFilter_IssueCommandToBackend Failed (io: 0x433682f8b850), with status Busy krnl:[22:41:20.917] 0/0 #2 - (skipped 0 prints) - IoEsx_ToStorage_v_handleSendToStorageFailed_i: Called with status (Busy) krnl:[22:41:20.917] 0/0 #0 - IoEsx_ToStorage_s_sendToStorageDone: Incorrect callback for a failed IO Submit for io 0x433682f8b850, skipping CommandIoBase_v_storageEndIo 2. Even though the underlying storage issue is not caused by the splitter, because these IOs are not handled properly, a continuous loop of VSCSI resets occurs and the VM remains hung even after the storage issue is resolved. ESXi host VMkernel logs will also show outputs similar to: 2020-06-11T22:53:40.679Z cpu2:2097832)VSCSI: 2903: handle 20963(vscsi0:10):Reset [Retries: 18/0] from (vmm0:ProdVM01) 2020-06-11T22:54:11.680Z cpu3:2097832)VSCSI: 2903: handle 20963(vscsi0:10):Reset [Retries: 19/0] from (vmm0:ProdVM01) 2020-06-11T22:54:42.682Z cpu6:2097832)VSCSI: 2903: handle 20963(vscsi0:10):Reset [Retries: 20/0] from (vmm0:ProdVM01) 2020-06-11T22:55:13.683Z cpu5:2097832)VSCSI: 2903: handle 20963(vscsi0:10):Reset [Retries: 21/0] from (vmm0:ProdVM01) ... 2020-06-11T22:58:49.692Z cpu0:2097832)VSCSI: 2903: handle 20963(vscsi0:10):Reset [Retries: 28/0] from (vmm0:ProdVM01) 2020-06-11T22:59:20.693Z cpu3:2097832)VSCSI: 2903: handle 20963(vscsi0:10):Reset [Retries: 29/0] from (vmm0:ProdVM01) 2020-06-11T22:59:51.694Z cpu20:2097832)VSCSI: 2903: handle 20963(vscsi0:10):Reset [Retries: 30/0] from (vmm0:ProdVM01) 2020-06-11T23:00:22.696Z cpu5:2097832)VSCSI: 2903: handle 20963(vscsi0:10):Reset [Retries: 31/0] from (vmm0:ProdVM01)
A splitter code fix to correct a PSOD(Reference PSOD after IO Submission failure) may, in some scenarios, cause the splitter handling of failed IOs to be stuck in a loop, rendering the host VM unresponsive.
Workaround:Rebooting the ESXi host is currently the only way to release the VM. vMotion all other VMs from the ESXi hostReboot the ESXi host Interim preventive workaround:A solution exists for this issue, but intervention from Dell Technologies technical support personnel is required. Dell Technologies technical support can provide a Hotfix version of the splitter with some limitations. Contact the Dell Technologies Customer Support Center or your service representative for technical support and reference this Dell Technologies knowledgebase solution ID.Resolution:Dell Technologies Engineering is currently investigating this issue. A permanent fix is still in progress. Contact the Dell Technologies Customer Support Center or your service representative for assistance and reference this solution ID.