Symptom
Firepower 9300 HA pair experienced multiple failover's due to MIO Heartbeat failure.
Conditions
FTD Multi Instance deployments
Workaround
Heartbeat interval can be increased to 15 seconds for temporarily mitigating the issue.
Recommend to configure the app-agent heart beat interval * retry-count to >= 6000 milli seconds
Example:
> app-agent heartbeat interval 1000 retry-count 6
Further Problem Description
The issue is due to communication issue between the MIO and app-instance which seems to be going offline multiple times. There are multiple instances of heart beats missing and once the threshold is met, failover is happening.
In a 2 instance setup, each of 40 cores on QW-4145, one of the instance flapped from Active -> Failed due to MIO heartbeat failure and later recovered.