...
Shared nothing vMotion (migration of both compute and storage) fails for very large VMs if the migration takes more than 24 hours to complete.vMotion logs in VM's vmware.log logs may looks similar to:2021-04-22T10:40:52.175Z| vmx| I125: MigrateVMXdrToSpec: type: 1 srcIp=<10.10.10.10> dstIp=<10.10.10.11> mid=1f17d2f78ec07468 uuid=d9e1a354-aead-11e9-bf2c-0a94ef93176b priority=yes checksumMemory=no maxDowntime=0 encrypted=0 resumeDuringPageIn=no latencyAware=yes diskOpFile= srcLogIp=<<unknown>> dstLogIp=<<unknown>> ftPrimaryIp=<<unknown>> ftSecondaryIp=<<unknown>>2021-04-22T10:40:52.176Z| vmx| I125: MigrateSetInfo: state=8 srcIp=<10.10.10.10> dstIp=<10.10.10.11> mid=2240491300333843560 uuid=d9e1a354-aead-11e9-bf2c-0a94ef93176b priority=high..2021-04-23T16:35:10.868Z| vcpu-0| I125: MigratePlatformRestoreVnicBackingChangeOnFailure: RestoreVnicBacking-vnicBackingChange: vNicIndex 0 switchUuid 50 18 2a b7 15 19 16 7b-be 5e 0d 3f 17 e1 b8 f8 portKey2021-04-23T16:35:10.868Z| vmx| I125: [msg.migrate.waitdata.platform] Failed waiting for data. Error bad0003. Not found.2021-04-23T16:35:10.868Z| vmx| I125: [vob.vmotion.dvs.state.restore.failed] vMotion migration [a27340b:2240491300333843560] failed to get DVS state in the restore phase from the source host <10.10.10.10>..2021-04-23T16:35:10.885Z| vcpu-0| I125: FILE: FileCreateDirectoryEx: Failed to create /tmp. Error = 172021-04-23T16:35:10.885Z| vcpu-0| I125: FILE: FileCreateDirectoryEx: Failed to create /tmp/vmware-root. Error = 17..2021-04-23T16:35:10.895Z| vcpu-0| I125: [msg.checkpoint.precopyfailure] Migration to host <10.10.10.11> failed with error Connection reset by peer (0xbad004b)...2021-04-23T16:35:10.895Z| vcpu-0| I125: [vob.migrate.net.xfer.recvfailed.status] The migration transfer failed during the receive operation to socket 4311686C4AE0: received 0/36 bytes: Connection reset by peer.2021-04-23T16:35:10.895Z| vcpu-0| I125: [vob.vmotion.stream.keepalive.read.fail] vMotion migration [a27340b:2240491300333843560] failed to read stream keepalive: Connection reset by peervmkernel.log on the source ESXi host:2021-04-23T16:35:10.756Z cpu14:5749041)VMotion: 5417: 2240491300333843560 S: Estimated network bandwidth 129.844 MB/s during disk copy.2021-04-23T16:35:10.867Z cpu73:6022824)WARNING: VMotionUtil: 862: 2240491300333843560 S: failed to read stream keepalive: Connection reset by peer2021-04-23T16:35:10.868Z cpu73:6022824)WARNING: Migrate: 282: 2240491300333843560 S: Failed: Connection reset by peer (0xbad004b) @0x418022f0f9f32021-04-23T16:35:10.895Z cpu70:5749046)WARNING: Migrate: 6145: 2240491300333843560 S: Migration considered a failure by the VMX. It is most likely a timeout, but check the VMX log for the true error.vmkernel.log on the destination ESXi host:2021-04-23T16:35:10.836Z cpu0:2242333)VMotionRecv: 693: 2240491300333843560 D: Estimated network bandwidth 129.855 MB/s during disk copy.2021-04-23T16:35:10.837Z cpu20:2242332)WARNING: VMotionSend: 3618: 2240491300333843560 D: failed to get DVS state in the restore phase from the source host <10.10.10.10>2021-04-23T16:35:10.837Z cpu20:2242332)WARNING: VMotionSend: 5923: 2240491300333843560 D: Failed handling message reply GET_DVS_STATE: Not found2021-04-23T16:35:10.837Z cpu20:2242332)WARNING: Migrate: 282: 2240491300333843560 D: Failed: Not found (0xbad0003) @0x41801c6c4bb22021-04-23T16:35:10.868Z cpu64:2242306)WARNING: Migrate: 6145: 2240491300333843560 D: Migration considered a failure by the VMX. It is most likely a timeout, but check the VMX log for the true error.2021-04-23T16:35:10.868Z cpu64:2242306)WARNING: VMotion: 565: 2240491300333843560 D: Storage stream IO error: 458752
Failure of such migrations is due to missing port on the destination ESXi host.For any migrations, vCenter reserves a dvs port for 24 hours. After 24 hours, port reservation expires and the port is deleted by the DvsMonitor, hence the VM migrated to the destination host has no port to connect to and the migration fails.
This is default and expected behaviour.Please see the workaround section for a workaround.
To workaround this issue, port reservation timeout needs to be extended to allow the migration to complete:1. SSH to the vCenter.2. Make a copy/backup of file /etc/vmware-vpx/vpxd.cfg: cp /etc/vmware-vpx/vpxd.cfg /etc/vmware-vpx/vpxd.cfg.bak 3. Insert the following section to vpxd.cfg: <vpxd> <dvs> <PortReserveTimeoutInMin>7200</PortReserveTimeoutInMin> </dvs> <cert>This will extend port reservation timeout to 5 days (24*60*5=7200), and it should be enough to cover any time-demanding (slow/large VM (50TB+)) share-nothing vMotion.4. Restart vpxd service: service-control –-restart vmware-vpxd 5. Re-try storage vMotion of your large VM(s).6. Once the VMs have been migrated, revert the vpxd.cfg file back to its original form: cp /etc/vmware-vpx/vpxd.cfg.bak /etc/vmware-vpx/vpxd.cfg 7. Restart vpxd service: service-control --restart vmware-vpxd