...
After upgrading to VMware vCenter Server 5.x and 6.0, VMware High Availability (HA) is no longer working.A red exclamation mark displays on the Cluster Object.Enabling VMware HA fails.You see the error: Operation Timed out
This article discusses troubleshooting a component of HA (FDM) in vCenter Server 5.x and 6.0. For information about troubleshooting HA (AAM) in vCenter Server 4.x, see Troubleshooting VMware High Availability (HA) in VMware vSphere 4.x (1001596).
Because vCenter Server 5.x and 6.0 uses Fault Domain Manager (FDM) agents for High Availability (HA), rather than Automated Availability Manager (AAM) agents, the troubleshooting process has changed.There are other architectural and feature differences that affect the troubleshooting process: There is one main log file (/var/log/fdm.log) and syslog integrationDatastore HeartbeatReduced Cluster configuration (approximately 1 minute, as opposed to 1 minute per host)FDM does not require that DNS be configured on the hosts, nor does FDM rely on other Layer 3 to 7 network services. For more information, see the How vSphere HA works section in the vSphere Availability Guide. For more information about HA in vCenter Server 5.x, see Comparing VMware HA 4.x and vSphere HA 5.x (2004401). Known Issues If HA Configuration fails or gives error Operation Timed out, "Operation Timed out" error while configuring HA in vCenter Server(2011974).If SSL Certificate checking is disabled in vCenter Server, configuration can fail with the error: Cannot complete the configuration of the vSphere HA agent on the host. For more information, see Configuring HA after upgrading to vCenter Server 5.0 fails with the error: Cannot complete the configuration of the vSphere HA agent on the host. Misconfiguration in the host setup (2006729). This issue is resolved in vCenter Server 5.0 Update 1. On an upgrade using custom SSL certificates, the configuration can fail with the error: vSphere HA cannot be configured on this host because it's SSL thumbprint has not been verified. For more information, see After upgrading to vSphere 5, you see the HA error: vSphere HA Cannot be configured on this host because its SSL thumbprint has not been verified (2006210). This issue is resolved in vCenter Server 5.0 Update 1. If the webpage on an ESXi host has been disabled, configuration can fail with the error: Unknown installer error. For more information, see Cannot configure HA after disabling the Host Welcome login page on an ESXi host (2009546). If you run VMware-fdm-uninstall.sh manually in the default location, it does not properly remove the HA package. Configuration can fail with unknown installer error. For more information, see Cannot install Fault Domain Manager agent for VMware HA after agent is uninstalled (2006034). This issue is resolved in vCenter Server 5.0 Update 1. If lockdown mode is enabled on an ESXi host, HA configuration can fail with Cannot install the vCenter agent service,vSphere HA agent cannot be correctly installed or configured, Permission to perform this operation was denied. For more information, see Cannot install the vSphere HA (FDM) agent on an ESXi host (2007739). This is resolved in vCenter Server 5.0 Update 1. Migrating a virtual machine from one HA cluster to another changes the virtual machine's protection state from Protected to Unprotected. For more information, see Migrating virtual machine to another HA cluster changes the virtual machine state from Protected to Unprotected (2012682). FDM goes into an uninitialized state when a security scan is run against an ESXi 5 host. This is resolved in vCenter Server 5.0 Update 2. For related information, see vCenter Server 5.0 Update 1 Release Notes and vCenter Server 5.0 Update 2 Release Notes. Common Misconfiguration Issues FDM configuration can fail if ESX hosts are connected to switches with automatic anti-DOS features. FDM does support Jumbo Frames, but the MTU setting has to be consistent from end to end on every device. Some firewall devices block ICMP pings that have an ID of zero. In such cases, FDM could report that some or all secondary hosts cannot ping each other, and/or that the isolation addresses cannot be reached. This issue has been resolved in: vCenter Server 5.0 Update 2. For more information, see the vCenter Server 5.0 Update 2 Release Notes. To download the latest version of vCenter Server 5.0, see the VMware Download Center.vCenter Server 5.1. To download the latest version of vCenter Server 5.1, see the VMware Download Center. The workaround is to set an alternate isolation address das.isolationaddressand set das.usedefaultisolationaddress to false. For more information on configuration, see Advanced Configuration options for VMware High Availability for pre-5.0 (1006421). FDM troubleshooting steps Troubleshooting issues with FDM: Check the for known issues. Ensure that you are you using the latest version of vSphere. For information on known issues, see vSphere Release Notes.Ensure that you have properly configured HA. For information, see How vSphere HA works section of the vSphere Availability Guide.Verify that network connectivity exists from the vCenter Server to the ESXi host. For more information, see Testing network connectivity with the ping command (1003486).Verify that the ESXi Host is properly connected to vCenter Server. For more information, see Changing an ESXi or ESX host's connection status in vCenter Server (1003480).Verify that the datastore used for HA heartbeats is accessible by all hosts.Verify that all the configuration files of the FDM agent were pushed successfully from the vCenter Server to your ESXi host: Location: /etc/opt/vmware/fdmFile Names: clusterconfig (cluster configuration), compatlist (host compatibility list for virtual machines), hostlist (host membership list), and fdm.cfg. Increase the verbosity of the FDM logs to get more information about the the cause of the issue. Change the below entry in /etc/opt/vmware/fdm/fdm.cfg <log> ... <level>verbose</level> ... </log> To: <log> ... <level>trivia</level> ... </log>Search the log files for any error message: /var/log/fdm.log or /var/run/log/fdm* (one log file for FDM operations)/var/log/fdm-installer.log (FDM agent installation log) Contact FDM's Managed Object Browser (MOB), at https://hostname/mobfdm, for more information. The MOB can be used to dump debug information about FDM to /var/log/vmware/fdm/fdmDump.log file. It can also provide key information about the status of FDM from the perspective of the local ESX server: a list of protected virtual machines, secondary host, events etc. For more information, see the Managed Object Browser section in the vSphere Web Services SDK Programming Guide. If the issue persists, file a support request with VMware Support and quote this Knowledge Base article ID (2004429) in the problem description. For more information, see How to file a Support Request in Customer Connect (2006985).
For additional FDM/HA troubleshooting, see the vSphere Troubleshooting Guide. Troubleshooting VMware High Availability (HA) in VMware vSphere 4.xChanging an ESXi or ESX host's connection status in vCenter ServerTesting network connectivity with the ping commandAdvanced Configuration options for VMware High Availability for pre-5.0HA fails to configure at 90% completion with the error: Internal AAM Error - agent could not startReinstalling the vpxa or aam agent without losing the host record from the VMware vCenter Server databaseComparing VMware HA 4.x and vSphere HA 5.xCannot install Fault Domain Manager agent for VMware HA after agent is uninstalledAfter upgrading to vSphere 5, you see the HA error: vSphere HA Cannot be configured on this host because its SSL thumbprint has not been verifiedConfiguring HA after upgrading to vCenter Server 5.0 fails with the error: Cannot complete the configuration of the vSphere HA agent on the host. Misconfiguration in the host setupHow to file a Support Request in Customer ConnectCannot install the vSphere HA (FDM) agent on an ESXi hostCannot configure HA after disabling the Host Welcome login page on an ESXi host"Operation Timed out" error while configuring HA in vCenter ServerMigrating virtual machine to another HA cluster changes the virtual machine state from Protected to UnprotectedVMware vCenter Server 5.x および 6.0 における VMware High Availability (HA) の問題のトラブルシューティングSolución de problemas de VMware High Availability (HA)在 VMware vCenter Server 5.x/6.0 中对 VMware High Availability (HA) 问题进行故障排除