Loading...
Loading...
Consider a firewall HA (High Availability) setup where the failover link is connected directly between the units (back-to-back). In some cases, after a unit reload the HA failover peer link can start flapping constantly. In the CLI the following messages appear every few seconds: Communication with other unit become ok Primary: Switching to Ok for reason Recovered from communication failure. Communication with other unit become ok Primary: Switching to Ok for reason Recovered from communication failure. Communication with other unit become ok ... The 'show failover history' output of the unit that did not reboot shows repeating events: 06:05:38 UTC Jun 9 2025 Failed Cold Standby Recovered from communication failure 06:05:54 UTC Jun 9 2025 Cold Standby Failed HA state progression failed as response not heard from peer 06:06:08 UTC Jun 9 2025 Failed Cold Standby Recovered from communication failure 06:06:24 UTC Jun 9 2025 Cold Standby Failed HA state progression failed as response not heard from peer ... On the other hand, the unit that rebooted shows the remote peer as Failed: firewallB# show failover | include host This host: Secondary - Active Other host: Primary - Failed Captures on the failover unit that reloaded show bidirectional communication, but also ARP requests sent to the remote peer, but no replies arriving: firewallA# capture FOVER int Failover-Link firewallB# show capture FOVER 15 packets captured 1: 06:11:23.765387 192.0.2.2 > 192.0.2.1 ip-proto-105, length 18 2: 06:11:23.765631 192.0.2.1 > 192.0.2.2 ip-proto-105, length 18 ... 7: 06:11:24.767370 192.0.2.2 > 192.0.2.1 ip-proto-105, length 306 8: 06:11:25.268357 192.0.2.2 > 192.0.2.1 ip-proto-105, length 306 9: 06:11:25.769338 192.0.2.2 > 192.0.2.1 ip-proto-105, length 54 10: 06:11:25.769369 arp who-has 192.0.2.1 tell 192.0.2.2 11: 06:11:25.769415 192.0.2.2 > 192.0.2.1 ip-proto-105, length 306 12: 06:11:26.270341 192.0.2.2 > 192.0.2.1 ip-proto-105, length 306 13: 06:11:26.771307 192.0.2.2 > 192.0.2.1 ip-proto-105, length 54 14: 06:11:26.771352 arp who-has 192.0.2.1 tell 192.0.2.2 The remote peer sends ARP replies: firewallA# show capture FOVER 1433 packets captured 1: 06:11:15.206944 192.0.2.2 > 192.0.2.1 ip-proto-105, length 18 2: 06:11:15.207035 192.0.2.1 > 192.0.2.2 ip-proto-105, length 18 3: 06:11:15.748053 arp who-has 192.0.2.1 tell 192.0.2.2 4: 06:11:15.748114 arp reply 192.0.2.1 is-at 00-00-5E-00-53-00 <---- 5: 06:11:15.810108 192.0.2.1 > 192.0.2.2 ip-proto-8, length 118 6: 06:11:16.208912 192.0.2.2 > 192.0.2.1 ip-proto-105, length 18 7: 06:11:16.208988 192.0.2.1 > 192.0.2.2 ip-proto-105, length 18 8: 06:11:16.750006 arp who-has 192.0.2.1 tell 192.0.2.2 9: 06:11:16.750052 arp reply 192.0.2.1 is-at 00-00-5E-00-53-00 <---- In the backend of the secondary unit, there are NPU (Network Processing Unit) accelerator errors increasing: firewallB(local-mgmt)# show npu-accel statistics | grep err | egrep -v "= 0" par_hi_pri_q_err_pkt_cnt = 64261 par_lo_pri_q_err_pkt_cnt = 266750 firewallB(local-mgmt)# show npu-accel statistics | grep err | egrep -v "= 0" par_hi_pri_q_err_pkt_cnt = 64276 par_lo_pri_q_err_pkt_cnt = 266795
A reload of a firewall that belongs to an HA pair.
Reloading again the unit that rebooted can solve the problem temporarily, but the problem could reappear if the unit reloads again.
None
Click on a version to see all relevant bugs
Cisco Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.