...
Network partition among vSAN host(s).vSAN cluster partition Network alarm in Skyline health.
This happens because the unicast agent list is invalid or incomplete, and one or more hosts cannot communicate with other vSAN hosts in the cluster.In the following output of a cluster of 4, one host is missing from the cluster:[root@esxi-04:~] esxcli vsan cluster getCluster Information Enabled: true Current Local Time: 2021-03-30T13:40:44Z Local Node UUID: 602583eb-233c-b69a-8291-0050562a1e8c Local Node Type: NORMAL Local Node State: MASTER Local Node Health State: HEALTHY Sub-Cluster Master UUID: 602583eb-233c-b69a-8291-0050562a1e8c Sub-Cluster Backup UUID: 602572bd-2ef4-8f69-d8ce-0050562a1e93 Sub-Cluster UUID: 52cd69c8-e409-363f-bd75-654caad744b3 Sub-Cluster Membership Entry Revision: 4 Sub-Cluster Member Count: 3 Sub-Cluster Member UUIDs: 602572bd-2ef4-8f69-d8ce-0050562a1e93, 602583eb-233c-b69a-8291-0050562a1e8c, 60198995-b367-2922-8fbf-0050562a1cbd Sub-Cluster Member HostNames: esxi-02.lab.local, esxi-04.lab.local, esxi-03.lab.local Sub-Cluster Membership UUID: f4266360-e165-0b0b-7bac-0050562a2682 Unicast Mode Enabled: true Maintenance Mode State: OFF Config Generation: 81f5b3c2-fe55-4a00-9eb5-c7ada74a43d8 20 2021-03-30T13:26:12.0In the unicastagent list of each host all nodes part of the vSAN cluster must be present except the host which is been logged in. In the same scenario since it is a cluster of 4 nodes the unicastagent list must have three inputs, which confirmed that one host is missing:[root@esxi-04:~] esxcli vsan cluster unicastagent listNodeUuid IsWitness Supports Unicast IP Address Port Iface Name Cert Thumbprint SubClusterUuid------------------------------------ --------- ---------------- ------------- ----- ---------- ----------------------------------------------------------- --------------602572bd-2ef4-8f69-d8ce-0050562a1e93 0 true 192.168.10.13 12321 2E:FD:44:6F:8B:26:6F:AE:59:FF:08:93:B6:D6:E8:7B:AC:2A:ED:9660198995-b367-2922-8fbf-0050562a1cbd 0 true 192.168.10.14 12321 55:26:A9:AD:3E:DA:36:EA:6F:5F:BD:85:48:BF:A3:BC:BC:F2:E1:4EA possible cause for this is if IgnoreClusterMemberListupdates is set to a value of 1 on one or more hosts in the cluster.A value of 1 tells the host to ignore any updates coming from vCenter regarding the unicast agent list.A value of 0, which is the default setting, tells the host to accept the changes coming from vCenter.To check the current setting run the following command:esxcfg-advcfg -g /VSAN/IgnoreClusterMemberListUpdates
***Important*** Run this command on all hosts before making any changes to the unicast agent listesxcfg-advcfg -s 1 /VSAN/IgnoreClusterMemberListupdatesOnce the unicast agent list has been fixed on all hosts run the below command on all hosts to set IgnoreClusterMemberListupdates back to its default setting of 0esxcfg-advcfg -s 0 /VSAN/IgnoreClusterMemberListupdates
1) Open an SSH session to all the nodes in the vSAN cluster and using the command esxcli vsan cluster unicastagent list verify which hosts have an incomplete unicast list.[root@esxi-01:~] esxcli vsan cluster unicastagent listNodeUuid IsWitness Supports Unicast IP Address Port Iface Name Cert Thumbprint SubClusterUuid------------------------------------ --------- ---------------- ------------- ----- ---------- ----------------------------------------------------------- --------------602572bd-2ef4-8f69-d8ce-0050562a1e93 0 true 192.168.10.13 12321 2E:FD:44:6F:8B:26:6F:AE:59:FF:08:93:B6:D6:E8:7B:AC:2A:ED:9660198995-b367-2922-8fbf-0050562a1cbd 0 true 192.168.10.14 12321 55:26:A9:AD:3E:DA:36:EA:6F:5F:BD:85:48:BF:A3:BC:BC:F2:E1:4E602583eb-233c-b69a-8291-0050562a1e8c 0 true 192.168.10.12 12321 63:44:4B:7B:D6:6B:26:49:C5:C4:B7:7D:0D:86:67:BE:FA:90:EF:2B[root@esxi-02:~] esxcli vsan cluster unicastagent listNodeUuid IsWitness Supports Unicast IP Address Port Iface Name Cert Thumbprint SubClusterUuid------------------------------------ --------- ---------------- ------------- ----- ---------- ----------------------------------------------------------- --------------60257046-5d95-a750-7135-0050562a1e7e 0 true 192.168.10.11 12321 52:6D:E9:73:FD:3A:0E:F1:AE:E2:E8:82:CA:27:F3:8E:28:EB:C8:E560198995-b367-2922-8fbf-0050562a1cbd 0 true 192.168.10.14 12321 55:26:A9:AD:3E:DA:36:EA:6F:5F:BD:85:48:BF:A3:BC:BC:F2:E1:4E602583eb-233c-b69a-8291-0050562a1e8c 0 true 192.168.10.12 12321 63:44:4B:7B:D6:6B:26:49:C5:C4:B7:7D:0D:86:67:BE:FA:90:EF:2B[root@esxi-03:~] esxcli vsan cluster unicastagent listNodeUuid IsWitness Supports Unicast IP Address Port Iface Name Cert Thumbprint SubClusterUuid------------------------------------ --------- ---------------- ------------- ----- ---------- ----------------------------------------------------------- --------------602572bd-2ef4-8f69-d8ce-0050562a1e93 0 true 192.168.10.13 12321 2E:FD:44:6F:8B:26:6F:AE:59:FF:08:93:B6:D6:E8:7B:AC:2A:ED:9660257046-5d95-a750-7135-0050562a1e7e 0 true 192.168.10.11 12321 52:6D:E9:73:FD:3A:0E:F1:AE:E2:E8:82:CA:27:F3:8E:28:EB:C8:E5602583eb-233c-b69a-8291-0050562a1e8c 0 true 192.168.10.12 12321 63:44:4B:7B:D6:6B:26:49:C5:C4:B7:7D:0D:86:67:BE:FA:90:EF:2B[root@esxi-04:~] esxcli vsan cluster unicastagent listNodeUuid IsWitness Supports Unicast IP Address Port Iface Name Cert Thumbprint SubClusterUuid------------------------------------ --------- ---------------- ------------- ----- ---------- ----------------------------------------------------------- --------------602572bd-2ef4-8f69-d8ce-0050562a1e93 0 true 192.168.10.13 12321 2E:FD:44:6F:8B:26:6F:AE:59:FF:08:93:B6:D6:E8:7B:AC:2A:ED:9660198995-b367-2922-8fbf-0050562a1cbd 0 true 192.168.10.14 12321 55:26:A9:AD:3E:DA:36:EA:6F:5F:BD:85:48:BF:A3:BC:BC:F2:E1:4E2) Once identified which hosts have incomplete/invalid unicastagent list, find the UUID and vSAN IP address of the missing/invalid hosts:"In this case esxi-04 is missing 1 host (esxi-01)"Go to the missing host and get the UUID:[root@esxi-01:~] cmmds-tool whoami60257046-5d95-a750-7135-0050562a1e7eFind the vSAN vmk IP address:"Here vmk3 is used for vSAN"[root@esxi-01:~] esxcfg-vmknic -lInterface Port Group/DVPort/Opaque Network IP Family IP Address Netmask Broadcast MAC Address MTU TSO MSS Enabled Type NetStack -------------------output shrinked------------------------vmk2 vmotion IPv6 fe80::250:56ff:fe6e:1418 64 00:50:56:6e:14:18 1500 65535 true STATIC, PREFERRED defaultTcpipStackvmk3 vsan IPv4 192.168.10.11 255.255.255.0 192.168.10.255 00:50:56:6e:6b:df 1500 65535 true STATIC defaultTcpipStackvmk3 vsan IPv6 fe80::250:56ff:fe6e:6bdf 64 00:50:56:6e:6b:df 1500 65535 true STATIC, PREFERRED defaultTcpipStack3) Add the entry to the unicast agent list:Syntax: esxcli vsan cluster unicastagent add -t node -u <Host_UUID> -U true -a <Host_VSAN_IP> -p 12321[root@esxi-04:~] esxcli vsan cluster unicastagent add -t node -u 60257046-5d95-a750-7135-0050562a1e7e -U true -a 192.168.10.11 -p 123214) Verify that the cluster is complete:[root@esxi-04:~] esxcli vsan cluster getCluster Information Enabled: true Current Local Time: 2021-03-30T14:21:55Z Local Node UUID: 602583eb-233c-b69a-8291-0050562a1e8c Local Node Type: NORMAL Local Node State: AGENT Local Node Health State: HEALTHY Sub-Cluster Master UUID: 60257046-5d95-a750-7135-0050562a1e7e Sub-Cluster Backup UUID: 60198995-b367-2922-8fbf-0050562a1cbd Sub-Cluster UUID: 52cd69c8-e409-363f-bd75-654caad744b3 Sub-Cluster Membership Entry Revision: 5 Sub-Cluster Member Count: 4 Sub-Cluster Member UUIDs: 60257046-5d95-a750-7135-0050562a1e7e, 60198995-b367-2922-8fbf-0050562a1cbd, 602572bd-2ef4-8f69-d8ce-0050562a1e93, 602583eb-233c-b69a-8291-0050562a1e8c Sub-Cluster Member HostNames: esxi-01.motogp.lab, esxi-03.motogp.lab, esxi-02.motogp.lab, esxi-04.motogp.lab Sub-Cluster Membership UUID: f5a46060-4df7-160b-4fdc-0050562a268d Unicast Mode Enabled: true Maintenance Mode State: OFF Config Generation: 81f5b3c2-fe55-4a00-9eb5-c7ada74a43d8 20 2021-03-30T14:21:15.294
Alternative methods for fixing the uncast agent list # example: host 9 lost entries # ignore updates on every esxi: esxcfg-advcfg -s 1 /VSAN/IgnoreClusterMemberListupdates # get entries from another host/s esxcli vsan cluster unicastagent list # example output: ------------------------------------ --------- ---------------- ---------- ----- ---------- 58c7ebe0-e608-9fd4-0ccc-1402ec8b17b0 0 true 10.20.3.6 12321 552555f9-cc64-7d88-2b3d-38eaa71723b0 0 true 10.20.3.7 12321 552558c2-ba81-5960-7a38-8cdcd4ac0e48 0 true 10.20.3.8 12321 55255365-dadf-992f-1f7d-8cdcd4acd514 0 true 10.20.3.9 12321 # on host 9 set the missing ones: esxcli vsan cluster unicastagent add -i vmk2 -t node -U 1 -a 10.20.3.6 -u 58c7ebe0-e608-9fd4-0ccc-1402ec8b17b0 esxcli vsan cluster unicastagent add -i vmk2 -t node -U 1 -a 10.20.3.7 -u 552555f9-cc64-7d88-2b3d-38eaa71723b0 esxcli vsan cluster unicastagent add -i vmk2 -t node -U 1 -a 10.20.3.8 -u 552558c2-ba81-5960-7a38-8cdcd4ac0e48 Run the below script on the host that is missing from the respected unicast agent list to build the command to be run on the host with the missing entry. NODE_UUID=$(esxcli vsan cluster get | grep -E "Local Node UUID" | awk '{print $4}');VSAN_VMK=$(esxcli vsan network list | grep VmkNic| awk '{print $3}');NODE_IP=$(esxcli network ip interface ipv4 get -i $VSAN_VMK | grep vmk | awk '{print $2}');echo "esxcli vsan cluster unicastagent add -t node -u $NODE_UUID -U true -a $NODE_IP -p 12321 -i $VSAN_VMK"