...
You are running a VCHA manual failover to the passive nodeThe VCHA failover completes successfully however the vCenter Webclient page shows the error "503 Service Unavailable"vmware-sts-idmd logs show idmd service not starting [2019-10-30T22:59:41.691Z ERROR] [IdmServer] IDM Server has failed to startcom.vmware.identity.interop.ldap.InvalidCredentialsLdapException: Invalid credentials vmdird logs show password errors for the vCenter machine account 19-10-30T22:57:51.581345+00:00 err vmdird t@140674263410432: VmDirSendLdapResult: Request (Bind), Error (49), Message ((49)(SASL step failed.)), (0) socket (127.0.0.1)19-10-30T22:57:51.602017+00:00 err vmdird t@140674263410432: Bind Request Failed (127.0.0.1) error 49: Protocol version: 3, Bind DN: "cn=vc01.test.local,ou=Domain Controllers,dc=vsphere,dc=local", Method: SASL19-10-30T22:58:01.591518+00:00 err vmdird t@140674263410432: SASLSessionStep: sasl error (-13)(SASL(-13): authentication failure: client evidence does not match what we calculated. Probably a password error) Resetting the machine account password for vCenter user using KB https://kb.vmware.com/s/article/2147280 fixes the issue temporarily, however issue recurs post failover to the passive node
vmdird logs showed that the copy interval for data.mdb file is 0 19-02-01T02:29:15.527397+00:00 info vmdird t@140331222071040: VmDirInitDbCopyThread: database snapshot reg keys: CopyDbWritesMin 1 CopyDbIntervalInSec 0 CopyDbBlockWriteInSec 30vmdir maintains copy of machine account passwords in registry and mdb file . mdb file is always updated with any changes happening (like password changes) at an interval specified by CopyDbIntervalInSec. Since it is currently set to 0, no mdb file will be created and a password mismatch ( between mdb & registry) happens. The current sync interval for the machine account password is 45 days and if a vcha failover is triggered within 45 days then this issue will be triggered.
VMware Engineering is aware of this issue and is working on a permanent fix in a future release.Currently there is no permanent solution, however a workaround is available.
To work-around the issue, follow the below steps in order: Login to Active vCenter and perform the below steps1. Set the CopyDbIntervalInSec registry value using this command /opt/likewise/bin/lwregshell set_value '[HKEY_THIS_MACHINE\Services\vmdir\Parameters]' "CopyDbIntervalInSec" "60"2. Reset password using methods described in KB https://kb.vmware.com/s/article/21472803. Restart the vmdird service using KB https://kb.vmware.com/s/article/21098874. Wait for few minutes for replication of snapshot and registry values to complete - ~5 mins5. Trigger a VCHA manual Failover6. Once the Passive node becomes Active set the CopyDbIntervalInSec Registry in the Passive Node(Now Active) /opt/likewise/bin/lwregshell set_value '[HKEY_THIS_MACHINE\Services\vmdir\Parameters]' "CopyDbIntervalInSec" "60"8. Restart the vmdird service using KB https://kb.vmware.com/s/article/21098879. Trigger a VCHA manual failover and confirm the services are accessible