Symptoms
Virtual Machines(VMs) suddenly lose connectivity to all or some network destinations. Pings to those addresses fail.
Connectivity is restored by disconnecting and reconnecting the vNIC, or migrating the VM to another ESXi host. During these operations, the vmxnet3 vNIC generates a message about “hang detected" in the ESXi kernel logs, similar to the following: "Vmxnet3: 21100: vmname.eth0,00:50:56:11:11:11, portID(67101010): Hang detected,numHangQ: 1, enableGen: 1011"
The host is using bnxtnet "async" driver of version 224.0.x.x or later for the uplinks of the affected VMs.
Cause
The Broadcom bnxtnet async driver version 224.0.x.x or later has an issue that can miss TX packet completion under certain circumstances. This could block the VM's vNIC TX queues, and thus block some or all packets leaving the vNIC.
Impact / Risks
Random and intermittent network loss.
Resolution
Broadcom has released new versions of bnxtnet and bnxtroce drivers containing the fix, starting with version 226.0.145.4-1.Please consult the VCG (HCL) or your OEM for the driver and firmware version matching the specific NIC model.
Related Information
Before the fix:
To workaround the issue, downgrade the bnxtnet driver to a version below 224.0.x.x. This includes 223.0.206.0 for ESXi 8 or 223.0.152.2 for ESXi 7.