...
After a firewall reboot, the failover is disabled on the peer HA unit. In the failover history messages like the following can be seen: 15:48:34 CET Dec 23 2023 App Sync Disabled CD App Sync error is Failure in Standby/Slave. Check app-sync-history CLI for details On the unit that it is added to the HA pair and gets stuck messages like the following can be seen: > show app-sync-history ================================APP SYNC HISTORY================================ -------------------------------------------------------------------------------- App Sync Time: 13:47:10 UTC Jul 11 2023 Role: Standby Unit App Sync Status: FAILURE Failed Phase: StandbyAppConfigSignal Failure Reason: DeploymentException:Process Manager failed to secure LSP APPLY_APP_CONFIG_APPLICATION_FAILURE SignalAppConfigFailed: Please refer policy_deployment.log file for more details; In the FTD /ngfw/var/log/ngfwManager.log this message is seen: Dec 23 14:48:31 ccm[7260] CDExec-Th-1: ERROR com.cisco.ngfw.cd.phases.AppConfigSignal- SIGNAL App Config Failure: Please refer policy_deployment.log file for more details; In /ngfw/var/log/sf/policy_deployment.log these messages are seen: Dec 23 15:47:18 FW-DMZ2001 policy_apply.pl[14637]: INFO START securing LSP on install. lsp-rel-20231220-1501 (Snort::SnortUtil 282 <- LSP::Device 214 <- Plugin 235) Dec 23 15:48:19 FW-DMZ2001 policy_apply.pl[14637]: Error returned 1 Dec 23 15:48:19 FW-DMZ2001 policy_apply.pl[14637]: Dec 23 15:48:19 FW-DMZ2001 policy_apply.pl[14637]: Not all lsp files are in the icdb. Can't continue signature verification. Dec 23 15:48:19 FW-DMZ2001 policy_apply.pl[14637]: 1 Dec 23 15:48:19 FW-DMZ2001 policy_apply.pl[14637]: ERROR Process Manager failed to verify LSP ICDB (Snort::SnortUtil 290 <- LSP::Device 214 <- Plugin 235) Dec 23 15:48:19 FW-DMZ2001 policy_apply.pl[14637]: ERROR ERROR: Process Manager failed to secure LSP (/ngfw/var/cisco/deploy/sandbox/exporter-pkg/code/SF/UMPD/Plugins/Snort/SnortUtil.pm line 291) (Framework 1590<1348 <- Transaction 1772 <- main 214)
CPU cores allocated for system processes could get busy at times. This cause LSP verification to take considerably more time than what it typically takes resulting in timeouts and failures.
Run 'top -d 1' from expert mode shell to see if any processes (other than Lina & Snort) taking too much CPU cycles on a continuous basis. If not, re-deploying the HA/Policy could may help to complete in time. Please contact Cisco TAC if the issue persist. When devices are in Active-Disabled state and if a deployment is triggered, deployment will be performed only on the active unit and it will be marked as success. Disabled node doesn’t join automatically. ?configure high-availability resume? must be done manually on the disabled node
A more comprehensive fix is available - see CSCwi72294