Symptoms
Connection servers fail to build/delete VMs and a large number of instant clones enter maintenance mode but do not rebuild.Debug level Connection server logs show long-running HA message alerts similar to the highlighted log entries below: Reference this article for details on Log Location Collecting VMware Horizon View log bundles (1017939
2023-03-09T14:01:51.988-08:00 WARN (14A0-2160) <HARequestMsgThread> [HAResourceManager] HA message took a long time to process: 5000: REQUEST
2023-03-09T14:01:51.988-08:00 DEBUG (14A0-2160) <HARequestMsgThread> [HAResourceManager] Long running HA message: {POOLMESSAGETYPE=HACONTROL, _MS_MODE=ENHANCED, REQUESTID=33075A30-0C58-4CCB-8D99-414AF2F7E9A7, MAPPAYLOAD
Trace level Connection server logs show agent VMs sending async (terminal response) messages multiple times a second rather than the expected update schedule. This will be similar to the highlighted log entries below:Reference this article for details on increasing agent Logs to trace: Changing the log file behavior in the VMware Horizon components (1025887)
2023-03-28T09:49:07.702-07:00 DEBUG (1BD0-0E38) <DesktopControlJMS> [DesktopTracker] Processing: Type:(TextMessage\ASYNCNOTIFICATION); .......cn=<pool dn>,ou=server groups,dc=vdi,dc=vmware,dc=int</SERVERPOOLDN><SERVERDNSNAME><agent DNS name></SERVERDNSNAME><DYNAMICIPADDRESS><agent IP></DYNAMICIPADDRESS><MACADDRESSIPV4><agent MAC></MACADDRESSIPV4>
2023-03-28T09:49:07.702-07:00 TRACE (1BD0-0E38) <DesktopControlJMS> [DesktopTracker] Processing: Type:(TextMessage\ASYNCNOTIFICATION); ......MAC></MACADDRESSIPV4>...
2023-03-28T09:49:07.703-07:00 DEBUG (1BD0-0E38) <DesktopControlJMS> [DesktopTracker] Processing: Type:(TextMessage\ASYNCNOTIFICATION); ...........
2023-03-28T09:49:07.703-07:00 TRACE (1BD0-0E38) <DesktopControlJMS> [DesktopTracker] Processing: Type:(TextMessage\ASYNCNOTIFICATION); Headers:dn>>...
Agent debug logs also show near-constant async updates sent to the Connection servers.
Cause
This interval is configured using the registry key HKLM\Software\VMware, Inc.\VMware VDM\Node Manager\AsyncSessionSeconds with a default value of 150 seconds.In an edge case when the Horizon Agent fails to load the JMS service, this wait interval adjustment is skipped (defaults to 0 causing near-constant async updates)
Resolution
Should you encounter this issue please gather Trace level Connection server and agent logging and then engage VMware support for assistance.
Workaround
An agent that enters this bad state is a random occurrence.Steps to Monitor:
Monitor Horizon Virtual Machines at the Vcenter level for CPU usage. You can leverage the built-in Vcenter alarms for this task
Virtual machine CPU usage alarm (2057830) documents the default alarms.
Remove any Horizon machines that are pegging CPU at 100% CPU to allow time for the connection server to recover.After all machines in a bad state have been removed, shut down all connection servers and then restart after power-up.
Engage VMware:
Please gather Trace level Connection server and agent logging and engage VMware support for assistance with a long-term solution.