...
The following symptoms may be observed A large IPSet has been configured using an IP range instead of IP in CIDR formatThe UI and API are inaccessible with a "Server is overloaded" error { "module_name" : "common-service", "error_message" : "Server is overloaded", "error_code" : "98" } HTTPS and Manager services are intermittently going DOWN e.g. mgr1> get cluster statusGroup Type: MANAGERGroup Status: DEGRADED Members:UUID FQDN IP STATUS dec14d56-8f2a-1bd9-619b-515e4acab973 mgr1 192.168.1.4 DOWN 87714d56-4e0a-626f-af02-6cfc54edca1b mgr2 192.168.1.5 DOWN 4bbe4d56-2a1d-bf95-bf36-5429dc222ec2 mgr3 192.168.1.6 UP Group Type: HTTPSGroup Status: DEGRADED Members:UUID FQDN IP STATUS dec14d56-8f2a-1bd9-619b-515e4acab973 mgr1 192.168.1.4 UP 87714d56-4e0a-626f-af02-6cfc54edca1b mgr2 192.168.1.5 DOWN 4bbe4d56-2a1d-bf95-bf36-5429dc222ec2 mgr3 192.168.1.6 UP The NSX Manager VIP is moving between Managers and intermittently the VIP IP does not respond to ping/var/log/proton/proton-tomcat-wrapper.log has the following has the following out of memory logging INFO | jvm 4 | 2020/03/02 14:39:03 | java.lang.OutOfMemoryError: Java heap space STATUS | wrapper | 2020/03/02 14:39:03 | The JVM has run out of memory. Requesting thread dump. STATUS | wrapper | 2020/02/05 14:39:03 | The JVM has run out of memory. Restarting JVM.
If a very large IPset is created using a range e.g. 0.0.0.0-255.255.255.255, when the range to CIDR conversion takes place it results in java heap memory exhaustion.
This issue is resolved in VMware NSX-T Data Center 3.0, available at VMware Downloads.
Change the IPSET to use CIDR format e.g.Change 0.0.0.0 - 255.255.255.255 to 0.0.0.0/0As the UI/API are inaccessible the following procedure can be used to allow a short window to make the change1) Identify which Manager has the VIPmgr1> get cluster status verboseGroup Type: HTTPSGroup Status: STABLEMembers: UUID FQDN IP STATUS 65fb0342-4977-bced-4552-b310011f6a79 mgr1 192.168.120.10 UP 96f10342-1fa9-9b57-809e-69825d0e683f mngr2 192.168.120.11 UP 292a0342-d2b5-7f7b-3273-eb22c1158410 mgr3 192.168.120.12 UPLeaders: SERVICE LEADER LEASE VERSION ap 292a0342-d2b5-7f7b-3273-eb22c1158410 6619Use the cli "get nodes" to identify which manager has this UUID, in this case 292a0342-d2b5-7f7b-3273-eb22c11584102) Restart the proton servicensx-mngr> restart service managerorroot@mp: etc/init.d/proton restart3) This will allow a short window to edit or delete the problematic IPSET via UI or API