Symptoms
NSX-T ESXi host upgrade fails when upgrading from NSX-T Data Center 2.4.0 or 2.4.1NSX-T DFW is configured with a rule of Service Type TCP/UDP/ALG for any of the following ports
TCP 21 - ALG FTP TCP 1521 - ALG ORACLE_TNS TCP 111 - ALG SUN_RPC_TCP TCP 135 - ALG MS_RPC_TCP UDP 69 - ALG TFTP
Upgrade Coordinator display a long error string which contains the following exception
"KernelModulesException: Failed to unload module nsxt-vsip: Cannot remove module nsxt-vsip: Consumed resource count of module is not zero"
The ESXi /var/log/vmkernel.log contains the following messaging
2020-01-25T12:22:27.020Z cpu28:98022942)Destroying solution lock.2020-01-25T12:22:27.020Z cpu28:98022942)Unregistering char device2020-01-25T12:22:27.024Z cpu28:98022942)WARNING: Heap: 2734: Non-empty heap (vsip-state) being destroyed (avail is 38203008, should be 38203216).2020-01-25T12:22:27.106Z cpu28:98022942)ALERT: Mod: 5212: Failed to unload module nsxt-vsip, since its consumed resource count is 1. Waiting...2020-01-25T12:22:32.124Z cpu28:98022942)ALERT: Mod: 5241: Failed to unload module nsxt-vsip, since its consumed resource count is 1. Giving up.
Cause
The upgrade fails because the 2.4.0/2.4.1 nsxt-vsip module cannot be unloaded and uninstalled from the host.This occurs because of a memory heap issue caused by the ALG component of the DFW.
Resolution
This issue is resolved in:
VMware NSX-T Data Center 2.4.2, available at VMware Downloads.
VMware NSX-T Data Center 2.5, available at VMware Downloads.
VMware NSX-T Data Center 3.0, available at VMware Downloads.
Workaround
To workaround this issue after a host upgrade has failed
Reboot the ESXi host which will resolve the error condition on the hostOn the NSX-T UI -> System -> Fabric -> Nodes -> Host Transport Node, ensure the host is not in maintenance modeOn the Upgrade Coordinator reset the host upgrade error and restart the host upgrade