...
While attempting to install NSX-T on host, installation fails at 18% with the following errors: "failed to install software on a host <hostname>:java.rmi.RemoteExcepion: [Live installation error] Error in running ['/rtc/init.d/nsx-opsagent'. 'stop'. 'upgrade']: Return code : 1 Output OK to upgrade nsx-opsagent stop nsx-opsagent stop watchdog Terminating watchdog process with process PID 2105211 sh: you need to specify whom to kill nsx-ops-agent service is stopping cp: can't stat" The resolve option does not help, if you attempt to run 'del nsx' in the nsxcli of the ESXi host, results in the below errors: delete_nsx_instance_from_host.sh: INFO: NSX reset script called with argument fabric_node on nsx-esx delete_nsx_instance_from_host.sh: INFO: Run transport_node reset on ESX node % Failed to remove all host switches or logical switches delete_nsx_instance_from_host.sh: ERROR: Failed to reset nsxa app of nsx-opsagent. Please check ospagent logs for more details. , stderr: <date-time> ERROR: Failed to reset nsxa app of nsx-opsagent. Please check ospagent logs for more details." The /var/run/log/esxupdate.log on the ESXi host shows vdl2 unload failed errors: cpu48:4580298)Mod: 5098: Unloading module <vmk-module-uuid> ... cpu48:4580298)vdl2: VDL2Cleanup:756: [nsx@6876 comp="nsx-esx" subcomp="<vmk-module-uuid>"]Starting cleanup cpu48:4580298)ALERT: Mod: 5251: Failed to unload module <vmk-module-uuid>, since its consumed resource count is 1. Waiting... cpu48:4580298)ALERT: Mod: 5280: Failed to unload module <vmk-module-uuid>, since its consumed resource count is Below host properties are set to true on the DVS, which can be seen by running net-dvs -l. com.vmware.nsx.kcp.enable com.vmware.nsx.spf.enabled com.vmware.nsx.vdl2.enabled com.vmware.net.portset.fc.enabled com.vmware.net.portset.fc.mcast.enabled
This occurs when the uninstall process is unable to remove the module when certain advance configurations are applied on the host switch.
1. First confirm using the below command that the module displayed in the error is still enabled: # net-dvs -l | grep com.vmware.nsx.kcp com.vmware.nsx.kcp.enable = true , propType = CONFIG com.vmware.nsx.kcp.enable = true , propType = CONFIG2. Then we need to disable the module for each DVS in use, using the following syntax: # net-dvs -u "<property>" -p hostPropList <switchName> 3. To find the DVS names: # esxcfg-vswitch -l 4. For example for DVS named RegionA01-VDS7: # net-dvs -u com.vmware.nsx.kcp.enable -p hostPropList RegionA01-VDS7 5. Using the command from step 1, check the module is disabled.6. Place the ESXi host in vSphere maintenance mode and on the ESXi nsxcli shell run: # nsxcli> del nsx 7. Confirm the NSX-T VIBs have been remove: esxcli software vib list | grep -i nsx8. If the ESXi hosts still have some error on the NSX-T UI, perform the removal again in the NSX-T UI, using the force delete option.9. If you still find the DVS has the com.vmware.nsx.kcp enabled, please reboot the host and repeat step 2.