Symptom
The LACP timer (both sending and timeout) are moving in accordance with changes to the host clock.
For example,
- If the host OS clock moves behind, then we wait and not send any packets till we reach the time before clock reset
- This can potentially cause a LACP timeout on the connected switch if the clock change is more than 60 seconds
- Similarly, if the host OS clock moves ahead, then we immediately send out a LACP PDU and move the LACP timeout timer also ahead
- This can potentially cause the N1k to send a LACP timeout to the connected switch if the clock change is more than 60 seconds
Conditions
- Happens only when there is a clock change on the ESX host
- In most cases this would only cause flaps on the port-channel and it will recover
- In more extreme cases it will cause the port channel to remain down a restart of the VEM will be required to recover
Workaround
Ensure the ESXi Host is synchronised to an accurate NTP host that is unlikely to experience synchronisation events of greater than a few seconds offset
When initially configuring NTP on a 1000v you have LACP port channels or intend to configure them on the 1000v:
If no LACP configured:
Configure NTP on ESXi Host and VSM first. Wait for offset to stabilise.
Configure LACP
If LACP is already configured:
Manually set the clock to as close as possible to the NTP server
i.e. A few seconds and definitely within 60 seconds
Note: If the offset is larger than 60 seconds, do so in steps of less than 60 seconds
Configure NTP
If LACP timeouts are experienced due to unexpected clock changes, the VEM may need to be restarted by SSH or Console access to the ESXi Host
The vem can be restarted by:
vem restart
Further Problem Description