...
You will observe PSODs backtraces similar to the following: 2020-05-07T08:30:26.104Z cpu59:5261210)WARNING: Heartbeat: 760: PCPU 44 didn't have a heartbeat for 21 seconds; *may* be locked up.^[[0m2020-05-07T08:30:26.104Z cpu44:2097436)ALERT: NMI: 696: NMI IPI: RIPOFF(base):RBP:CS [0xc7490(0x418001000000):0x4302ee371a80:0xfc8] (Src 0x1, CPU44)^[[0m2020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1b788:[0x4180010c748f]SafeMemAccess_CmpXchg4ExceptionPossible@vmkernel#nover+0xe stack: 0x4302ee371d402020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1b790:[0x418001157a50]FastSlabCreateObj@vmkernel#nover+0x88 stack: 0x1000000012020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1b810:[0x418001158013]FastSlabReplenishCPU@vmkernel#nover+0x6e stack: 0x41804b005fd02020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1b850:[0x418001156425]FastSlabAllocSlow@vmkernel#nover+0x7e stack: 0x02020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1b870:[0x4180011564de]FastSlab_AllocWithTimeout@vmkernel#nover+0x83 stack: 0x451b88e1b9b82020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1b8c0:[0x41800103c959]vmk_PageSlabAlloc@vmkernel#nover+0x22 stack: 0x451b000008002020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1b8d0:[0x4180011d30aa]PktPageAlloc_AllocPages@vmkernel#nover+0x37 stack: 0x451b88e1b9502020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1b950:[0x41800125f56b]vmk_PktAllocPage@vmkernel#nover+0x10 stack: 0x4310177ed0102020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1b960:[0x418001b3b295]qfle3_page_alloc_and_map@(qfle3)#<None>+0x22 stack: 0xeb4bbd32020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1b9b0:[0x418001b51235]qfle3_alloc_rx_sge_mbuf@(qfle3)#<None>+0x2e stack: 0x3f2020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1b9f0:[0x418001b517d4]qfle3_alloc_fp_buffers@(qfle3)#<None>+0x2f5 stack: 0x2d35302d303230322020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1ba60:[0x418001b3d000]qfle3_rq_create@(qfle3)#<None>+0x3a9 stack: 0x02020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1bae0:[0x418001af4a77]qfle3_cmd_create_q@(qfle3)#<None>+0x15c stack: 0x02020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1bb30:[0x418001b2b65e]qfle3_sm_q_cmd@(qfle3)#<None>+0x147 stack: 0x102020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1bbb0:[0x418001b3c9c2]qfle3_rq_alloc@(qfle3)#<None>+0x2d7 stack: 0x4307036b27802020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1bc40:[0x4180012dd8bd]UplinkNetq_AllocHwQueueWithAttr@vmkernel#nover+0x92 stack: 0x172020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1bc90:[0x418001217435]NetqueueBalActivatePendingRxQueues@vmkernel#nover+0x156 stack: 0x79e280882020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1bd50:[0x418001218075]NetqueueBalRxQueueCommitChanges@vmkernel#nover+0x36 stack: 0x02020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1bd90:[0x41800121b677]UplinkNetqueueBal_BalanceCB@vmkernel#nover+0x19fc stack: 0x430779e7f1d02020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1bf00:[0x4180012d8309]UplinkAsyncProcessCallsHelperCB@vmkernel#nover+0x116 stack: 0x43090803f7b02020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1bf30:[0x4180010eaf7a]HelperQueueFunc@vmkernel#nover+0x157 stack: 0x43090803f0b82020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1bfe0:[0x41800130f9f2]CpuSched_StartWorld@vmkernel#nover+0x77 stack: 0x02020-05-07T08:30:30.214Z cpu11:2626161)VMotion: 5367: 4078885979155334064 S: Another pre-copy iteration needed with 377085 pages left to send (prev2 8388608, prev 8388608, pages dirtied by pass through device 0, network bandwidth ~1028.528 MB/s, 5663% t$ In additional to the similar PSOD above, you may see these messages from the qfle3 driver in the vmkernel.log: [7m2020-05-07T05:19:56.921Z cpu58:2097436)WARNING: qfle3: ecore_state_wait:315: timeout waiting for state 10[7m2020-05-07T05:19:56.921Z cpu58:2097436)WARNING: qfle3: qfle3_remove_queue_filter:2370: [vmnic5] RX 3 queue state not changed for fid: 0[7m2020-05-07T05:19:56.922Z cpu58:2097436)WARNING: qfle3: ecore_queue_chk_transition:5969: Blocking transition since pending was 400[7m2020-05-07T05:19:56.922Z cpu58:2097436)WARNING: qfle3: ecore_queue_state_change:4855: check transition returned an error. rc -2
Qlogic has released a new driver for ESXi 6.7 and 7.0 to address this issue:ESXi 6.7: Version 1.1.9.0https://customerconnect.vmware.com/downloads/details?downloadGroup=DT-ESXI67-QLOGIC-ETHERNET-ISCSI-FCOE-201230&productId=742ESXi 7.0: Version 1.4.8.0:https://customerconnect.vmware.com/downloads/details?downloadGroup=DT-ESXI70-QLOGIC-MRVL-E3-ETHERNET-ISCSI-FCOE-301250&productId=974
A workaround is to set a qfle3 module parameter to relieve fastslab pressure by reducing qfle3 driver queue/ring buffer requests and then rebooting the ESXi host:# esxcli system module parameters set -p "txqueue_nr=4 rxqueue_nr=4 rss_engine_nr=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 txring_bd_nr=1024 rxring_bd_nr=1024 enable_lro=0" -m qfle3# reboot
Click on a version to see all relevant bugs
VMware Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.