BugZero | VMware BugID 83646 - NSX-T VMs connectivity issues and Transport nodes ...

VMware - Defect ID: 83646

NSX-T VMs connectivity issues and Transport nodes controller connectivity status is UNKOWN

VMware - Defect ID: 83646

NSX-T VMs connectivity issues and Transport nodes controller connectivity status is UNKOWN

Last updated on May 6th, 2021

BugZero Risk Score
5.3 Medium

Overall: N/A

Severity: N/A

Community: N/A

Lifecycle: N/A

What is the BugZero Risk Score?

VMware Integration

Learn more about where this data comes from

VMware Integration

Learn more

Bug Scrub Advisor

Streamline upgrades with automated vendor bug scrubs

Bug Scrub Advisor

Learn more

BugZero Enterprise

Wish you caught this bug sooner? Get proactive today.

BugZero Enterprise

Learn more

Bug Details

Description

Symptoms

The cluster status shows as up and stable when you run: get cluster statusThe Transport nodes show as connected in the Fabric screen.In the Overview screen for System -> Fabric -> Nodes -> Edge or Host Transport nodes, the Controller Connectivity shows as UNKNOWN.Tunnels to these Transport nodes show as DOWN also.DFW rule publishing may fail due to this issue.Other CLI commands such as get nodes, get services may fail.You have NSX Intelligence installed.In the NSX-T manager proton-tomcat-wrapper.log we see: Exception in thread "ForkJoinPool.commonPool-worker-4" java.lang.OutOfMemoryError: unable to create new native threadThe JVM has run out of memory. Requesting thread dump.Dumping JVM state. at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:717) at java.util.concurrent.ForkJoinPool.createWorker(ForkJoinPool.java:1486) at java.util.concurrent.ForkJoinPool.tryAddWorker(ForkJoinPool.java:1517) at java.util.concurrent.ForkJoinPool.deregisterWorker(ForkJoinPool.java:1609) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:167)Exception in thread "ForkJoinPool.commonPool-worker-11" java.lang.OutOfMemoryError: unable to create new native threadThe JVM has run out of memory. Requesting thread dump. In the NSX-T manager nsxapi log we see a lot of events like the following, for example 2 in 3 seconds : INFO intelligence-alarm-start-stop EventSource 8004 MONITORING [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Starting EventSource If we do a thread dump, we can see a very large number of threads for the EventReportProcessor.java process in the proton-tomcat-wrapper.log, like the following: INFO | jvm 1 | 2021/03/17 12:55:21 | "pool-9971-thread-1" #83259 prio=5 os_prio=0 tid=0x0000725d04fb2800 nid=0x514 waiting on condition [0x0000725b6177d000]INFO | jvm 1 | 2021/03/17 12:55:21 | java.lang.Thread.State: WAITING (parking)INFO | jvm 1 | 2021/03/17 12:55:21 | at sun.misc.Unsafe.park(Native Method)INFO | jvm 1 | 2021/03/17 12:55:21 | - parking to wait for <0x0000725d3bf01140> (a java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)INFO | jvm 1 | 2021/03/17 12:55:21 | at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)INFO | jvm 1 | 2021/03/17 12:55:21 | at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)INFO | jvm 1 | 2021/03/17 12:55:21 | at java.util.concurrent.FutureTask.get(FutureTask.java:191)INFO | jvm 1 | 2021/03/17 12:55:21 | at com.vmware.nsx.monitoring.clientlibrary.core.EventReportProcessor$1.run(EventReportProcessor.java:94)INFO | jvm 1 | 2021/03/17 12:55:21 | at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)INFO | jvm 1 | 2021/03/17 12:55:21 | at java.util.concurrent.FutureTask.run(FutureTask.java:266)INFO | jvm 1 | 2021/03/17 12:55:21 | at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)INFO | jvm 1 | 2021/03/17 12:55:21 | at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)INFO | jvm 1 | 2021/03/17 12:55:21 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)INFO | jvm 1 | 2021/03/17 12:55:21 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)INFO | jvm 1 | 2021/03/17 12:55:21 | at java.lang.Thread.run(Thread.java:748)

Cause

There is a memory leak which occurs when certain events are called and not closed correctly.This leak causes the proton service to go out of memory and crash on an NSX-T manager.This manager is the one the Transport node is connected to with UNKNOWN status, which means the host can not get any further updates, this can lead to VM connectivity issues.

Resolution

This issue is resolved in NSX-T 3.1.2 available at VMware Downloads.

Workaround

Restart the proton service on the impacted NSX manager.Or uninstall NSX Intelligence.

Change history

Top VMware Defects by Risk Score

No bugs this month

VMware Integration

Learn more about where this data comes from

VMware Integration

Learn more

Bug Scrub Advisor

Streamline upgrades with automated vendor bug scrubs

Bug Scrub Advisor

Learn more

BugZero Enterprise

Wish you caught this bug sooner? Get proactive today.

BugZero Enterprise

Learn more

Ready to prevent the next vendor outage?

Get a demo

OPERATIONAL DEFECT DATABASE

VMware - Defect ID: 83646

NSX-T VMs connectivity issues and Transport nodes controller connectivity status is UNKOWN

VMware - Defect ID: 83646

NSX-T VMs connectivity issues and Transport nodes controller connectivity status is UNKOWN

Last updated on May 6th, 2021

BugZero Risk Score
5.3 Medium

Bug Details

Symptoms

Cause

Resolution

Workaround

Links

Top VMware Defects by Risk Score

Ready to prevent the next vendor outage?

OPERATIONAL DEFECT DATABASE

VMware - Defect ID: 83646

NSX-T VMs connectivity issues and Transport nodes controller connectivity status is UNKOWN

VMware - Defect ID: 83646

NSX-T VMs connectivity issues and Transport nodes controller connectivity status is UNKOWN

Last updated on May 6th, 2021

BugZero Risk Score5.3 Medium

Bug Details

Symptoms

Cause

Resolution

Workaround

Links

Top VMware Defects by Risk Score

Ready to prevent the next vendor outage?

BugZero Risk Score
5.3 Medium