Symptom
Setup:
Platform : NCS5500 ( All dnx family of boxes)
L3VPN setup with prefixes being leaked from VRF table to global table.
the leaked prefix must have a primary path and backup path going through the same bundle. ( BGP-PIC configured)
Trigger:
shut the primary and backup path together ( in this particular case primary and backup path was on the same bundle and trigger was to shut one of the bundle member which got the whole bundle down as the bundle had min links to be up configuration) and then bring the bundle back up by unshutting the bundle member.
Conditions
As mentioned above.
Workaround
Clear bgp ( that particular session) so that everything is learnt again for that BGP session.
Further Problem Description
Problem here was that we were receiving the following set of events when the bundle interface came up
even though it has two paths - primary and backup, the paths comes up one by one.
- pathlist create for the single path. ( even though it has two paths - one primary and other backup)
- Backup path comes up.
- we get a pathlist modify. ( In cases were there is no race condition we have seen a new create of pathlist with 2 paths instead of modifying the existing pathlist)
- When the above pathlist modify is done, there was a logical error wherein we were not walking to the respective prefixes to point to the modified pathlist.
- Hence the traffic was getting dropped.