...
Customer may see the below error on vSAN cluster summary page after upgrading from vSAN 6.6 to 6.7 .It seems that all VMs and hosts are working in a normal manner. In the vSAN health check tab, you may see the below error The logs will show following log pattern: The ESXi hosts /var/log/vsanmgmt.log will show below errors: 2019-04-27T13:50:53Z VSANMGMTSVC: WARNING vsanperfsvc[Thread-2] [VsanHealthSystemImpl::_QueryPhysicalDiskHealthSummary] entry = {'healthReason': 0, 'healthFlags': 0, 'timestamp': 127419771773} 2019-04-27T13:50:53Z VSANMGMTSVC: ERROR vsanperfsvc[Thread-2] [VsanHealthSystemImpl::_QueryPhysicalDiskHealthSummary] Failed to get disk encryption info Traceback (most recent call last): File "/build/mts/release/bora-12775454/bora/build/vsan/release/vsanhealth/usr/lib/vmware/vsan/perfsvc/VsanHealthSystemImpl.py", line 1813, in _QueryPhysicalDiskHealthSummary ValueError: Failed to open device /vmfs/devices/disks/naa.5002538a488c0a602019-04-30T21:33:07.993Z error hostd[5297249] [Originator@6876 sub=vmomi.soapStub[58]] Resetting stub adapter for server <cs p:0000001210900cb0, TCP:localhost.localdomain:9095> : service state request failed: N7Vmacore15SystemExceptionE(Connection reset by peer: The connection is terminated by the remote end with a reset packet. Usually, this is a sign of a network problem, timeout, or service overload.)2019-04-27T08:01:27Z VSANMGMTSVC: ERROR vsanperfsvc[906d9cca-68c2-11e9] [VsanEsxHclUtil::__init__] Failed to run tool storcli: Exception 'RunCommandError' occured running command '['/opt/lsi/storcli/storcli', '++group=host/vim/tmp', '/call', 'show', 'J']' On ESXi hosts /var/log/hostd.log:2019-05-02T08:29:05.352Z warning hostd[5297256] [Originator@6876 sub=Default] Failed to connect socket; <io_obj p:0x000000120dea5a70, h:120, <TCP '127.0.0.1 : 42447'>, <TCP '127.0.0.1 : 9095'>>, e: 111(Connection refused) On ESXi hosts /var/log/syslog.log:2019-05-02T09:56:35Z Unknown: out of memory [7124098] ( This message repeated multiple times consecutively ) The hostd logs may point to a network issue, but it might not be a networking issue. Double check the NIC drivers to ensure they are listed on the HCL: https://www.vmware.com/resources/compatibility/search.php?deviceCategory=io Also, you might see in /var/log/hostd.log on the ESXi host: 2019-05-02T08:29:05.588Z info hostd[5297215] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 2764 : vSAN virtual NIC has been added. 2019-05-02T08:28:56.188Z cpu40:7368912)WARNING: UserSocketInet: 2266: python: waiters list not empty!2019-05-02T08:28:56.188Z cpu40:7368912)WARNING: UserSocketInet: 2266: python: waiters list not empty!2019-05-02T08:29:00.349Z cpu70:7368912)WARNING: CMMDS: CMMDSArenaMemUnmapFromUser:194: Failed to unmap MPNs from world 7368917: Not found2019-05-02T08:29:05.486Z cpu0:2099591)CMMDS: CMMDSVSIUpdateNetworkCbk:2836: RECONFIGURE of interface vmk2 with cmmds (Success).2019-05-02T08:29:05.487Z cpu21:2099591)CMMDS: CMMDSUtil_PrintArenaEntry:41: [1035794]:Inserting (actDir:0):u:6217c35b-b6b7-53db-922e-6805ca7f6d1a o:5b9f2b83-6de1-4786-630f-6805ca7f6d1a r:23 t:NET_INTERFACE2019-05-02T08:29:05.487Z cpu21:2099591)CMMDS: CMMDSUtil_PrintArenaEntry:41: [1035795]:Removing (actDir:0):u:6217c35b-b6b7-53db-922e-6805ca7f6d1a o:5b9f2b83-6de1-4786-630f-6805ca7f6d1a r:22 t:NET_INTERFACE2019-05-02T08:29:09.198Z cpu42:2100048)WARNING: LSOM: LSOMVsiGetVirstoInstanceStats:800: Throttled: Attempt to get Virsto stats on unsupported disk52942248-4166-09b8-34ac-e5d4c1a8291b
The storecli service is causing the "out of memory" issue and disrupting other services with specific version of storcli (vmware-storcli-007.0209.0000.0000)
Remove the storcli VIB from the hosts. # esxcli software vib remove -n vmware-storcli-007.0209.0000.0000The command to remove the storcli VIB may fail. If this occurs, put the host in maintenance mode with ensure accessibility, and reboot the host. Attempt the command again when the host fully boots up.
Restarting the vsanmgmt service can clear the error, but it may return a few hours later.# /etc/init.d/vsanmgmtd restart