...
ESXi hosts disconnect intermittently from vCenter and reconnect after few secondsWe see from vpxd.log that hosts are being marked as having no response 2020-11-13T17:13:50.024Z warning vpxd[7F4E72BDE700] [Originator@6876 sub=MoHost opID=khgf51ol-j42-h5:70003542-b5-03-177f343e] [HostMo] host connection state changed to [NO_RESPONSE] for host-118631 We see from vpxd.log that Synchronizing tasks are not getting completed causing host to disconnect. 2018-08-28T17:20:09.901Z verbose vpxd[7F79218B1700] [Originator@6876 sub=vpxLro opID=HB-host-486@8725-487c45] [VpxLRO] -- BEGIN lro-4750 -- -- VpxdInvtHostSyncHostLRO.Synchronize --2018-08-28T17:20:09.901Z verbose vpxd[7F79218B1700] [Originator@6876 sub=InvtHostCnx opID=HB-host-486@8725-487c45] Synchronizing host; [vim.HostSystem:host-486,10.20.x.x]2018-08-28T17:20:09.909Z trivia vpxd[7F7922A54700] [Originator@6876 sub=HttpConnectionPool-000025] [PopPendingConnection] No pending connections to <cs p:00007f792812fe00, TCP:10.20..x.x:443>2018-08-28T17:20:09.909Z trivia vpxd[7F79218B1700] [Originator@6876 sub=HttpConnectionPool-000000 opID=HB-host-486@8725-487c45] [PopPendingConnection] No pending connections to <cs p:00007f792812fe00, TCP:10.20.x.x:443>2018-08-28T17:20:09.909Z verbose vpxd[7F79218B1700] [Originator@6876 sub=Vmomi opID=HB-host-486@8725-487c45] [ClientAdapterBase::InvokeOnSoap] Invoke done (10.20.x.x, vpxapi.VpxaService.retrieveChanges)2018-08-28T17:20:09.917Z trivia vpxd[7F79218B1700] [Originator@6876 sub=HttpConnectionPool-000000 opID=HB-host-486@8725-487c45] [PopPendingConnection] No pending connections to <cs p:00007f792812fe00, TCP:10..x.x:443>2018-08-28T17:20:09.917Z verbose vpxd[7F79218B1700] [Originator@6876 sub=Vmomi opID=HB-host-486@8725-487c45] [ClientAdapterBase::InvokeOnSoap] Invoke done (10.20.x.x, vmodl.query.PropertyCollector.createFilter)2018-08-28T17:20:09.917Z trivia vpxd[7F79218B1700] [Originator@6876 sub=VpxProfiler opID=HB-host-486@8725-487c45] Ctr: TrackInBackgroundTime = 0 ms2018-08-28T17:20:09.994Z trivia vpxd[7F7922C58700] [Originator@6876 sub=SoapAdapter.HTTPService] HTTP Response: Setting Content-Length: -1 hostd.log around the same time for same host. 2018-08-28T17:19:36.629Z verbose vpxa[4F925B70] [Originator@6876 sub=PropertyProvider opID=HB-host-486@8725-487c45-14] RecordOp ASSIGN: info.reason, session[529b2571-d666-67a3-93a7-61e1a9f67e57]52f82589-8aa1-5587-f04a-d6e1142ff8c1. Applied change to temp map.2018-08-28T17:19:36.629Z verbose vpxa[4F925B70] [Originator@6876 sub=PropertyProvider opID=HB-host-486@8725-487c45-14] [CommitChangesAndNotify] Updating cached values2018-08-28T17:19:36.629Z verbose vpxa[4F967B70] [Originator@6876 sub=PropertyProvider] RecordOp ASSIGN: info.startTime, session[529b2571-d666-67a3-93a7-61e1a9f67e57]52f82589-8aa1-5587-f04a-d6e1142ff8c1. Applied change to temp map.2018-08-28T17:19:36.629Z verbose vpxa[4F967B70] [Originator@6876 sub=PropertyProvider] [CommitChangesAndNotify] Updating cached values2018-08-28T17:19:36.629Z verbose vpxa[4F967B70] [Originator@6876 sub=PropertyProvider opID=HB-host-486@8725-487c45-14] RecordOp ASSIGN: info.state, session[529b2571-d666-67a3-93a7-61e1a9f67e57]52f82589-8aa1-5587-f04a-d6e1142ff8c1. Applied change to temp map.2018-08-28T17:19:36.629Z verbose vpxa[4F967B70] [Originator@6876 sub=PropertyProvider opID=HB-host-486@8725-487c45-14] RecordOp ASSIGN: info.cancelable, session[529b2571-d666-67a3-93a7-61e1a9f67e57]52f82589-8aa1-5587-f04a-d6e1142ff8c1. Applied change to temp map.2018-08-28T17:19:36.629Z verbose vpxa[4F967B70] [Originator@6876 sub=PropertyProvider opID=HB-host-486@8725-487c45-14] [CommitChangesAndNotify] Updating cached values2018-08-28T17:19:36.629Z info vpxa[4F967B70] [Originator@6876 sub=vpxLro opID=HB-host-486@8725-487c45-14] [VpxLRO] -- BEGIN session[529b2571-d666-67a3-93a7-61e1a9f67e57]52f82589-8aa1-5587-f04a-d6e1142ff8c1 -- vpxa -- vpxapi.VpxaService.retrieveChanges -- 529b2571-d666-67a3-93a7-61e1a9f67e572018-08-28T17:19:36.629Z verbose vpxa[4F967B70] [Originator@6876 sub=VpxProfiler opID=HB-host-486@8725-487c45-14] [1+] VpxaMoService::GetChanges2018-08-28T17:19:36.631Z verbose vpxa[4F967B70] [Originator@6876 sub=VpxaHalCnxHostagent opID=HB-host-486@8725-487c45-14] [VpxaHalCnxHostagent::DoCheckForUpdates] CheckForUpdates() done; version sent: 315435, version received: (no update) VPXD crashing, for all the host whenever there is a disconnect.In the vpxd.log we see similar to below 2018-08-28T16:35:49.371Z error vpxd[7F7968478700] [Originator@6876 sub=Vmomi opID=HB-host-486@8715-6a3c7a68-TaskLoop-28887855] Got vmacore exception: N7Vmacore15SystemExceptionE(Connection timed out)--> [context]zKq7AVECAAAAAIStcgAMdnB4ZAAAeF4rbGlidm1hY29yZS5zbwAAEBcbAMppGADIHysAcRUjAOv5JQAe4SMAR6wjAC6vIwA3tysBVHQAbGlicHRocmVhZC5zby4wAAIdjA5saWJjLnNvLjYA[/context]2018-08-28T16:35:49.374Z verbose vpxd[7F7968478700] [Originator@6876 sub=TaskInfo opID=HB-host-486@8715-6a3c7a68-TaskLoop-28887855] [VpxdTaskInfoPublisher] host:[host-486] WaitForUpdates failed: vmodl.fault.HostCommunication--> [backtrace begin] product: VMware VirtualCenter, version: 6.5.0, build: build-7515524, tag: vpxd, cpu: x86_64, os: linux, buildType: release--> backtrace[00] libvmacore.so[0x002B5E78]: Vmacore::System::Stacktrace::CaptureWork(unsigned int)--> backtrace[01] libvmacore.so[0x001B1710]: Vmacore::System::SystemFactoryImpl::CreateQuickBacktrace(Vmacore::Ref<Vmacore::System::Backtrace>&)--> backtrace[02] libvmacore.so[0x001869CA]: Vmacore::Throwable::Throwable(std::string const&)--> backtrace[03] vpxd[0x0054D40E]--> backtrace[04] vpxd[0x00622866]--> backtrace[05] vpxd[0x00628FBC]--> backtrace[06] vpxd[0x00622F65]--> backtrace[07] vpxd[0x00626877]--> backtrace[08] libvmomi.so[0x0012623C]: Vmomi::StubImpl::_Invoke_Task(Vmomi::ManagedMethod*, std::vector<Vmacore::Ref<Vmomi::Any>, std::allocator<Vmacore::Ref<Vmomi::Any> > >&, Vmacore::Ref<Vmomi::Any>&)--> backtrace[09] libvmomi.so[0x00199CBC]: Vmomi::Core::PropertyCollectorStub::WaitForUpdates(Vmacore::Optional<std::string> const&, Vmacore::Ref<Vmomi::Core::PropertyCollector::UpdateSet>&)--> backtrace[10] vpxd[0x0093BADC]--> backtrace[11] vpxd[0x0093CD0C]--> backtrace[12] vpxd[0x00542747]--> backtrace[13] vpxd[0x005427E0]--> backtrace[14] libvmacore.so[0x0023AC47]--> backtrace[15] libvmacore.so[0x0023AF2E]--> backtrace[16] libvmacore.so[0x002BB737]--> backtrace[17] libpthread.so.0[0x00007454]--> backtrace[18] libc.so.6[0x000E8C1D]--> [backtrace end] rhttp packet capture analysis shows packet drops. The captured packets at the VC side will show similar to below 323 2019-01-15 14:45:05.001649 192.168.xxx.xx 192.168.xxx.xx TCP 337 54552 → 443 [PSH, ACK] Seq=1 Ack=1 Win=850 Len=281 341 2019-01-15 14:45:05.005090 192.168.xxx.xx 192.168.xxx.xx TCP 6152 54552 → 443 [PSH, ACK] Seq=282 Ack=1 Win=850 Len=6096 352 2019-01-15 14:45:05.005374 192.168.xxx.xx 192.168.xxx.xx TCP 62 443 → 54552 [ACK] Seq=1 Ack=6378 Win=116 Len=0 353 2019-01-15 14:45:05.005386 192.168.xxx.xx 192.168.xxx.xx TCP 62 [TCP Window Update] 443 → 54552 [ACK] Seq=1 Ack=6378 Win=128 Len=0 411 2019-01-15 14:45:05.051974 192.168.xxx.xx 192.168.xxx.xx TCP 19189 443 → 54552 [PSH, ACK] Seq=1 Ack=6378 Win=128 Len=19133 412 2019-01-15 14:45:05.051992 192.168.xxx.xx 192.168.xxx.xx TCP 56 54552 → 443 [ACK] Seq=6378 Ack=19134 Win=827 Len=0 24683 2019-01-15 14:56:14.371966 192.168.xxx.xx 192.168.xxx.xx TCP 336 54552 → 443 [PSH, ACK] Seq=6378 Ack=19134 Win=850 Len=280 24684 2019-01-15 14:56:14.372086 192.168.xxx.xx 192.168.xxx.xx TCP 798 54552 → 443 [PSH, ACK] Seq=6658 Ack=19134 Win=850 Len=742 24700 2019-01-15 14:56:14.574974 192.168.xxx.xx 192.168.xxx.xx TCP 1078 [TCP Retransmission] 54552 → 443 [PSH, ACK] Seq=6378 Ack=19134 Win=850 Len=1022 And on the ESXi we will see packet capture analysis as below 566 2019-01-15 15:05:07.444059 192.168.xxx.xx 192.168.xxx.xx TCP 54 443 → 54552 [FIN, ACK] Seq=1 Ack=1 Win=128 Len=0567 2019-01-15 15:05:07.693861 192.168.xxx.xx 192.168.xxx.xx TCP 54 [TCP Retransmission] 443 → 54552 [FIN, ACK] Seq=1 Ack=1 Win=128 Len=0568 2019-01-15 15:05:07.983860 192.168.xxx.xx 192.168.xxx.xx TCP 54614 2019-01-15 15:06:52.533881 192.168.xxx.xx 192.168.xxx.xx TCP 54 443 → 54552 [RST, ACK] Seq=2 Ack=1 Win=128 Len=0 Note:The preceding log excerpts are only example.Date,time and environmental variables may vary depending on your environment
To stop the ESXi host disconnect.
It issue caused by packet drop between the vCenter and hosts.The re-transmissions on both sides are clear indication that the connection timed out because of lost network connectivity - both sides were sending packets that the other side didn't receive.This fix enables sending TCP keep-alive packets when the connection is idle.This can also happen when the vCenter is under very high load and constrained on resources
No impact.
This issue is resolved in: vCenter Server 6.5 U3f build 15259038, available at VMware DownloadsESXi 6.5 Patch Release ESXi650-201912002, available at VMware DownloadsvCenter 6.7 U3m build 17713310 , available at VMware DownloadsESXi 6.7 P05 ESXi670-202103001,available at VMware Downloads
To work around this issue, please add the lines on the below files: In the config.xml on ESXi host located at /etc/vmware/rhttpproxy/ NOTE: Take a backup of this file before modifying. <config> <vmacore>//some lines of other options ...... <tcpKeepAlive> <serverSocket> <isEnabled>true</isEnabled> <idleTimeSec>180</idleTimeSec> <probeTimeSec>180</probeTimeSec> <probeCount>8</probeCount> </serverSocket> <clientSocket> <isEnabled>true</isEnabled> <idleTimeSec>180</idleTimeSec> <probeTimeSec>180</probeTimeSec> <probeCount>8</probeCount> </clientSocket> </tcpKeepAlive>//some lines of other options...... </vmacore></config>
See below for more informationCapture Network Packets by Using the PacketCapture Utility and KB: Collecting network packets using the lightweight PacketCaptureCapturing and Tracing Network Packets by Using the pktcap-uw Utility and KB : Using the pktcap-uw tool in ESXi 5.5 and laterSee also KB : ESXi host disconnects intermittently from vCenter Server