Symptom
During the upgrade process, if the Bidirectional Forwarding Detection (BFD) process goes down with the reason "Control Detection Time Expired," applications dependent on BFD, such as BGP, also go down. Immediately after that, the leaf switch brings the BGP up without BFD being up. On an L3Out, this may create traffic blackholing, as BGP might remain up on the leaf switch peer until it times out while the leaf switch reloads for the upgrade.
Conditions
This issue occurs when using BGP (or another routing protocol) and BFD, and you upgrade the leaf switch software.
Workaround
- Increase the BFD timers or multipliers.
or
- Cost-out BGP (or the affected routing protocol) link to the upgraded leaf switch from peer side.
Further Problem Description
- System load on leaf switch upgrade may lead to BFD packets being dropped. Thus, BFD may not come up.
- BGP is allowed in such scenario to come up as per RFC5882. As the leaf switch is about to reload to complete the upgrade traffic, may be blackholed from the leaf switch peer if this is the preferred path.
- This issue was only seen on leaf switch upgrade with tight BFD timers. This issue is not seen when reloading the leaf switch.