Symptom
Customer's observing SDWAN BFD sessions flapping in scaled hub and spoke environment.
Conditions
Customer observed when running SDWAN software 20.3.x with OMP timers of 1 Hello 3 Multiplier complete stability.
Customer moved to 20.6.x SDWAN software and after upgrade their hub devices they started to observe flaps on the IPSEC rekey interval.
Customer environment has 4 Hub routers
Each hub has roughly 3000-4000 tunnels
After upgrading from 20.3.x -> 20.6.x when there is an IPSec rekey we see a large number of BFD tunnels go down
OMP timers are currently configured for 3 seconds (default 7)
The issue is believed to be related to FTM processing vs. the speed of processing for the OMP update of the new key.
As the new key is likely programmed on the remote end edge much faster than the hub as there are only a handful of tunnels on the remote end (vs 3000-4000 on the hubs)
Workaround
Increase the multiplier
-- default (6 multiplier) : one hello packet per second; total 7 seconds for BFD to declare session down.
Increase multiplier to tolerant longer rekey time in scaling.
Further Problem Description
There was another defect opened : CSCwh07173 that was duplicated to the following : CSCwd44614
However we are still seeing the behavior in 17.9.x SDWAN as well.