BugZero | VMware BugID 76808 - Some VMs on an ESXi 6.5 host may lose network conn...

VMware - Defect ID: 76808

Some VMs on an ESXi 6.5 host may lose network connectivity after vMotion or startup when the netqueue Tx queues for the vmnic are not properly initialized

VMware - Defect ID: 76808

Some VMs on an ESXi 6.5 host may lose network connectivity after vMotion or startup when the netqueue Tx queues for the vmnic are not properly initialized

Last updated on 2/23/2024

Overall: 0N/A

Severity: 0N/A

Community: 0N/A

Lifecycle: 0N/A

What is the BugZero Risk Score?

Vendor details

No defect details.

Overall: 0N/A

Severity: 0N/A

Community: 0N/A

Lifecycle: 0N/A

What is the BugZero Risk Score?

Vendor details

No defect details.

Symptoms

After vMotion or VM power on, the VM loses network connectivityOther VMs on the host may not be affected and may continue to function normallyThe number of netqueue Tx queues reported for the physical NIC that the VM is using does not match the number of queues that have been activated: vsish -e get /net/pNics/vmnicX/txqueues/info tx queues info { # active queues:2 default queue id:0 } vsish -e ls /net/pNics/vmnicX/txqueues/queues/ 0/ Note: Substitute vmnicX with the appropriate vmnic name. Notice in the above example, there are 2 Tx queues reported but only one queue, queue 0, is showing in the list of activated queues. Linux VMs may report the following entries in the messages log within the guest OS: Month DD HH:MM:SS hostname kernel: WARNING: at net/sched/sch_generic.c:265 dev_watchdog+0x26b/0x280() (Not tainted) Month DD HH:MM:SS hostname kernel: Hardware name: VMware Virtual Platform Month DD HH:MM:SS hostname kernel: NETDEV WATCHDOG: eth0 (vmxnet3): transmit queue 2 timed out Month DD HH:MM:SS hostname kernel: Modules linked in: iptable_filter ip_tables ipv6 microcode vmware_balloon sg i2c_piix4 shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom vmxnet3 vmw_pvscsi p ata_acpi ata_generic ata_piix vmwgfx ttm drm_kms_helper drm i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: vmci] Month DD HH:MM:SS hostname kernel: Pid: 0, comm: swapper Not tainted 2.6.32-642.4.2.el6.x86_64 #1 Month DD HH:MM:SS hostname kernel: Call Trace: Month DD HH:MM:SS hostname kernel: <IRQ> [<ffffffff8107c6f1>] ? warn_slowpath_common+0x91/0xe0 Month DD HH:MM:SS hostname kernel: [<ffffffff8107c7f6>] ? warn_slowpath_fmt+0x46/0x60 Month DD HH:MM:SS hostname kernel: [<ffffffff8149bd0b>] ? dev_watchdog+0x26b/0x280 Month DD HH:MM:SS hostname kernel: [<ffffffff8108ec75>] ? internal_add_timer+0xb5/0x110 Month DD HH:MM:SS hostname kernel: [<ffffffff8149baa0>] ? dev_watchdog+0x0/0x280 Month DD HH:MM:SS hostname kernel: [<ffffffff8108f907>] ? run_timer_softirq+0x197/0x340 Month DD HH:MM:SS hostname kernel: [<ffffffff8108f166>] ? update_process_times+0x76/0x90 Month DD HH:MM:SS hostname kernel: [<ffffffff8103e577>] ? native_apic_msr_write+0x37/0x40 Month DD HH:MM:SS hostname kernel: [<ffffffff81085275>] ? __do_softirq+0xe5/0x230 Month DD HH:MM:SS hostname kernel: [<ffffffff8100c38c>] ? call_softirq+0x1c/0x30 Month DD HH:MM:SS hostname kernel: [<ffffffff8100fca5>] ? do_softirq+0x65/0xa0 Month DD HH:MM:SS hostname kernel: [<ffffffff81085105>] ? irq_exit+0x85/0x90 Month DD HH:MM:SS hostname kernel: [<ffffffff81552cca>] ? smp_apic_timer_interrupt+0x4a/0x60 Month DD HH:MM:SS hostname kernel: [<ffffffff8100bc13>] ? apic_timer_interrupt+0x13/0x20 Month DD HH:MM:SS hostname kernel: <EOI> [<ffffffff8104601b>] ? native_safe_halt+0xb/0x10 Month DD HH:MM:SS hostname kernel: [<ffffffff8101696d>] ? default_idle+0x4d/0xb0 Month DD HH:MM:SS hostname kernel: [<ffffffff81009fe6>] ? cpu_idle+0xb6/0x110 Month DD HH:MM:SS hostname kernel: [<ffffffff8152f17a>] ? rest_init+0x7a/0x80 Month DD HH:MM:SS hostname kernel: [<ffffffff81c3b127>] ? start_kernel+0x429/0x436 Month DD HH:MM:SS hostname kernel: [<ffffffff81c3a33a>] ? x86_64_start_reservations+0x125/0x129 Month DD HH:MM:SS hostname kernel: [<ffffffff81c3a453>] ? x86_64_start_kernel+0x115/0x124 Month DD HH:MM:SS hostname kernel: ---[ end trace 7266a13370d01d2d ]--- Linux VMs may also report the following entries in the messages log within the guest OS: Month DD HH:MM:SS hostname kernel: vmxnet3 0000:0b:00.0: eth0: tx hang Month DD HH:MM:SS hostname kernel: vmxnet3 0000:0b:00.0: eth0: resetting Month DD HH:MM:SS hostname kernel: vmxnet3 0000:0b:00.0: eth0: tx hang Month DD HH:MM:SS hostname kernel: vmxnet3 0000:0b:00.0: eth0: intr type 3, mode 0, 9 vectors allocated Month DD HH:MM:SS hostname kernel: vmxnet3 0000:0b:00.0: eth0: NIC Link is Up 10000 Mbps

Cause

The issue occurs due to a known issue with ESXi's netqueue loadbalancer feature that can cause some of the adapter queues to fail to be initialized properly. This can lead to a loss of network connectivity when a VM is assigned to one of these queues by the network load balancer.Note: Network connectivity loss can have many different possible root causes. You are only experiencing this issue in this article if there is a difference between the number of Tx queues that are reported and the number of Tx queues that are activated as described in the Symptoms section.

Resolution

This issue is resolved in ESXi 6.5 P04 (ESXi650-201912002).ESXi 6.7 is not affected by this issue.

Workaround

To workaround this issue, perform the following steps:1. Disable and re-enable the NIC at the ESXi level: localcli network nic down -n vmnicX localcli network nic up -n vmnicX 2. Verify that the number of queues reported: vsish -e get /net/pNics/vmnicX/txqueues/info 3. Verify that the number of queues initialized matches the number of queues reported in #2: vsish -e ls /net/pNics/vmnicX/txqueues/queues/ Note: Ensure that there are redundant NICs on the vSwitch prior to attempting this change.

Related Information

Please notice that the amount of queues in '/net/pNics/vmnicX/txqueues/queues/' will always be lower (1) than in '/net/pNics/vmnicX/txqueues/info' IF the vmnic is in 'Down' state, e.g. no cable plugged in.

Original Vendor Announcement

No bugs this month

Ready to prevent the next vendor outage?

Get a demo

OPERATIONAL DEFECT DATABASE

VMware - Defect ID: 76808

Some VMs on an ESXi 6.5 host may lose network connectivity after vMotion or startup when the netqueue Tx queues for the vmnic are not properly initialized

VMware - Defect ID: 76808

Some VMs on an ESXi 6.5 host may lose network connectivity after vMotion or startup when the netqueue Tx queues for the vmnic are not properly initialized

Last updated on 2/23/2024

Vendor details

Vendor details

Description

Symptoms

Cause

Resolution

Workaround

Related Information

Links

Top VMware defects by risk score

Ready to prevent the next vendor outage?