Symptoms
- NSX version is 4.0.x,- or NSX-T version is lower than 3.2.3. - ESXi host may fail with a Purple Screen of Death (PSOD) and "PANIC bora/vmkernel/main/dlmalloc.c:4937 - Usage error in dlmalloc"- There is more than 64 Edge Transport Nodes running on the ESXi host.- Backtrace is similar to:PANIC bora/vmkernel/main/dlmalloc.c:4937 - Usage error in dlmallocPanicvPanicInt@vmkernel#nover+0x327 stack: 0x453a4f89b9b8, 0x0, 0x42002d0fefe3, 0x431242001300, 0x453a4f89b8e0Panic_NoSave@vmkernel#nover+0x4d stack: 0x453a4f89ba10, 0x453a4f89b9d0, 0x208, 0x42002d6a9053, 0x1349DLM_free@vmkernel#nover+0x22d stack: 0x431242001310, 0x42002d143afa, 0x3105ded66af, 0x453a4f89baf0, 0x3105ded66afHeap_Free@vmkernel#nover+0xba stack: 0x3105ded66af, 0x3b, 0x0, 0x3105ded66af, 0x431242001220VDL2CharDevIoctl@com.vmware.nsx.l2#1.1.7.0.21487563+0x124 stack: 0x10, 0x0, 0x10, 0x0, 0x0VMKAPICharDevDevfsWrapIoctl@vmkernel#nover+0x87 stack: 0x5c, 0x42002ed819f8, 0x42002d116991, 0x0, 0x2CharDriverIoctl@vmkernel#nover+0x7d stack: 0x430e6320e6a0, 0x430e63219920, 0x430c934f0780, 0x430c934f0780, 0x453a4f89be23DevFSIoctl@vmkernel#nover+0xad3 stack: 0x43120661bd50, 0x430a95414c50, 0x2, 0x0, 0x43110000003bFSSVec_Ioctl@vmkernel#nover+0x20 stack: 0x9, 0x42002d4b9105, 0x100, 0x400, 0x3FSSObjectIoctlCommon@vmkernel#nover+0x60 stack: 0x100, 0x400, 0x3, 0x100, 0xaFSS_IoctlByFH@vmkernel#nover+0x9f stack: 0x0, 0x3105ded66af, 0x3b, 0x3105ded66af, 0x0UserFile_PassthroughIoctl@vmkernel#nover+0x3f stack: 0x420053c00000, 0x42002d4d7f67, 0x433478409280, 0x433478409280, 0x453a4f89f140UserVmfs_Ioctl@vmkernel#nover+0x27 stack: 0x453a4f89f140, 0x0, 0x453a4f89bf40, 0xe, 0x453a4f89f000LinuxFileDesc_Ioctl@vmkernel#nover+0x51 stack: 0x453a4f89bf40, 0x10, 0x1, 0x42002d4b4864, 0xffffffffffffffefUser_LinuxSyscallHandler@vmkernel#nover+0x1a4 stack: 0x0, 0x0, 0x0, 0x42002d14e068, 0x10bgate_entry@vmkernel#nover+0x68 stack: 0x0, 0x10, 0x416a86e037, 0x3105ded66af, 0x3
Cause
This is caused by net-vdl2 command run on the ESXi host, usually run during collection of diagnostic log bundle.
Impact / Risks
Downtime on the ESXi host and failover of VMs (considering HA is enabled and allowed to restart the VMs on another host in the cluster).
Resolution
This is a known issue, fixed in NSX 4.1 and higher, and in NSX-T Data Center 3.2.3 and higher.
Workaround
Ensure that there is never more than 64 Edge Transport Nodes running on the ESXi host at any time.More than 64 Edge Transport Nodes per ESXi host can cause CPU/memory/storage contention, causing performance issues on the ESXi and also performance issues the Edge VMs running on the host.