...
The cluster status shows as up and stable when you run: get cluster statusThe Transport nodes show as connected in the Fabric screen.In the Overview screen for System -> Fabric -> Nodes -> Edge or Host Transport nodes, the Controller Connectivity shows as UNKNOWN.Tunnels to these Transport nodes show as DOWN also.DFW rule publishing may fail due to this issue.Other CLI commands such as get nodes, get services may fail.You have NSX Intelligence installed.In the NSX-T manager proton-tomcat-wrapper.log we see: Exception in thread "ForkJoinPool.commonPool-worker-4" java.lang.OutOfMemoryError: unable to create new native threadThe JVM has run out of memory. Requesting thread dump.Dumping JVM state. at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:717) at java.util.concurrent.ForkJoinPool.createWorker(ForkJoinPool.java:1486) at java.util.concurrent.ForkJoinPool.tryAddWorker(ForkJoinPool.java:1517) at java.util.concurrent.ForkJoinPool.deregisterWorker(ForkJoinPool.java:1609) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:167)Exception in thread "ForkJoinPool.commonPool-worker-11" java.lang.OutOfMemoryError: unable to create new native threadThe JVM has run out of memory. Requesting thread dump. In the NSX-T manager nsxapi log we see a lot of events like the following, for example 2 in 3 seconds : INFO intelligence-alarm-start-stop EventSource 8004 MONITORING [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Starting EventSource If we do a thread dump, we can see a very large number of threads for the EventReportProcessor.java process in the proton-tomcat-wrapper.log, like the following: INFO | jvm 1 | 2021/03/17 12:55:21 | "pool-9971-thread-1" #83259 prio=5 os_prio=0 tid=0x0000725d04fb2800 nid=0x514 waiting on condition [0x0000725b6177d000]INFO | jvm 1 | 2021/03/17 12:55:21 | java.lang.Thread.State: WAITING (parking)INFO | jvm 1 | 2021/03/17 12:55:21 | at sun.misc.Unsafe.park(Native Method)INFO | jvm 1 | 2021/03/17 12:55:21 | - parking to wait for <0x0000725d3bf01140> (a java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)INFO | jvm 1 | 2021/03/17 12:55:21 | at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)INFO | jvm 1 | 2021/03/17 12:55:21 | at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)INFO | jvm 1 | 2021/03/17 12:55:21 | at java.util.concurrent.FutureTask.get(FutureTask.java:191)INFO | jvm 1 | 2021/03/17 12:55:21 | at com.vmware.nsx.monitoring.clientlibrary.core.EventReportProcessor$1.run(EventReportProcessor.java:94)INFO | jvm 1 | 2021/03/17 12:55:21 | at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)INFO | jvm 1 | 2021/03/17 12:55:21 | at java.util.concurrent.FutureTask.run(FutureTask.java:266)INFO | jvm 1 | 2021/03/17 12:55:21 | at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)INFO | jvm 1 | 2021/03/17 12:55:21 | at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)INFO | jvm 1 | 2021/03/17 12:55:21 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)INFO | jvm 1 | 2021/03/17 12:55:21 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)INFO | jvm 1 | 2021/03/17 12:55:21 | at java.lang.Thread.run(Thread.java:748)
There is a memory leak which occurs when certain events are called and not closed correctly.This leak causes the proton service to go out of memory and crash on an NSX-T manager.This manager is the one the Transport node is connected to with UNKNOWN status, which means the host can not get any further updates, this can lead to VM connectivity issues.
This issue is resolved in NSX-T 3.1.2 available at VMware Downloads.
Restart the proton service on the impacted NSX manager.Or uninstall NSX Intelligence.
Click on a version to see all relevant bugs
VMware Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.