Symptoms
Hostd service detected to be non-responsive.Hosts lose connection from vCenter.ESXCLI commands fail with a connection failed error.Restarting management agents will not help.
>> vmkernel.log will show qedentv runs into infinite wait in RXQ destroy context on vmnic2023-03-22T05:21:32.253Z cpu56:2097436)[qedentv_multictx_remove_rx_rule:1756(vmnic0)]Removing mac:00:50:56:66:bd:62, vlan_id:0x0, from fp:3, op:MAC_DEL, hw_fn:02023-03-22T05:21:32.253Z cpu56:2097436)[qedentv_multictx_set_rx_rule:1350(vmnic0)]Applying 00:50:56:66:bd:62 filter, vlan_id:0xffff, fp_id:0, hw_fn:0.2023-03-22T05:21:32.253Z cpu56:2097436)[qedentv_multictx_q_free:6798(vmnic0)]fp:3, is_last:0, qtype:RX, hw_fn:02023-03-22T05:21:32.253Z cpu56:2097436)[qedentv_ecore_stop_queue:4530(vmnic0)]Disabling RSS for fp:3, vport id:3, rss_eng_id = 1, hw_fn:02023-03-22T05:21:32.253Z cpu56:2097436)[qedentv_ecore_stop_queue:4542(vmnic0)]Disabling TPA for fp:3, vport id:3, hw_fn:02023-03-22T05:21:33.272Z cpu56:2097436)[qedentv_free_mem_fp:2858(vmnic0)]Waiting for stats (fp:25) to complete.2023-03-22T05:21:34.272Z cpu56:2097436)[qedentv_free_mem_fp:2858(vmnic0)]Waiting for stats (fp:25) to complete.2023-03-22T05:21:35.272Z cpu56:2097436)[qedentv_free_mem_fp:2858(vmnic0)]Waiting for stats (fp:25) to complete.qedentv prints the same log for hours, 1 line per second2023-03-23T13:53:21.134Z cpu40:2097436)[qedentv_free_mem_fp:2858(vmnic0)]Waiting for stats (fp:25) to complete.2023-03-23T13:53:22.134Z cpu40:2097436)[qedentv_free_mem_fp:2858(vmnic0)]Waiting for stats (fp:25) to complete.2023-03-23T13:53:23.134Z cpu40:2097436)[qedentv_free_mem_fp:2858(vmnic0)]Waiting for stats (fp:25) to complete.2023-03-23T13:53:24.134Z cpu40:2097436)[qedentv_free_mem_fp:2858(vmnic0)]Waiting for stats (fp:25) to complete.2023-03-23T13:53:25.134Z cpu40:2097436)[qedentv_free_mem_fp:2858(vmnic0)]Waiting for stats (fp:25) to complete.
Cause
Root cause is still unknown.
Resolution
The infinite loop while waiting for stats completion in fp mem free is resolved in qedentv driver 3.70.35.0. This newer driver adds in a timeout limit for stats and netpoll completion in fp mem free.
Workaround
Note: Do not proceed with the workaround if qedentv driver 3.70.35.0 is installed.In qedentv driver 3.70.7.0, there is a module parameter (en_periodic_stats) that can be set to disable periodic stats and reboot the host to avoid this issue.Command the set the module parameter:
SSH to the affected ESXi hostRun the following command:
esxcfg-module -s 'en_periodic_stats=0' qedentv
Reboot the host