...
MLDP LFA with Label OLE over the bundle interface. Observed Next-HOP programming issue between mrib PI & PD. RP/0/RP0/CPU0:NOIDNDFDCCR001#sh mrib mpls forwarding labels 29271 detail Wed Jun 2 13:38:33.532 IST LSP information (mLDP) : LSM-ID: 0x002E9, Role: Mid Incoming Label : 29271 Transported Protocol : Explicit Null : None IP lookup : disabled Platform information : Slotmask: Primary: 0x290000, Backup: 0x292400 MGID Primary: 9159, Backup: 9160 Label Node Role: LMRIB_ROUTER_ROLE_MID, Tunnel Type: MVPN Tunnel Outsegment Info #2 [M/Swap, Recursive]: OutLabel: 31496, NH: 49.44.2.249, ID: 0x2E9, Sel IF: HundredGigE0/17/0/3 >>> Different interface in PI UL IF: HundredGigE0/17/0/3, Node-ID: 0x9320 Backup Tunnel: Un:0x0 Backup State: Ready, NH: 0.0.0.0, MP Label: 0 Backup Sel IF: HundredGigE0/19/0/4, UL IF: HundredGigE0/19/0/4, Node-ID: 0x9520 PlatInfo: Primary: [Main IF: Bundle-Ether713 (IFH: 0xca0), UL IF: TenGigE0/0/1/12 (IFH: 0x40003c0)] >>>> Issue state Different interface in PD Backup: [Main IF: Bundle-Ether10.1775 (IFH: 0x13a0), UL IF: HundredGigE0/8/0/6 (IFH: 0x14000140)]
1. MLDP LFA feature enabled. 2. The MLDP label has multiple outgoing legs, i.e. multiple outgoing legs to downstream MLDP peers. 3. These outgoing legs are over the bundle interface. 4. The issue is triggered with multiple core-bundle link flaps.
mRIB process restart
MLDP core bundle OLE programming failure is closely related to MRIB crash. Based on traces, looks like, during the bundle intf down event, the MRIB PD bundle backwalk happened first, and MRIB PD fails to select a new bundle member as no bundle member is available. PI will call Label OLE delete, where PD needs to remove the Bundle OLE instance from its label reference list. But we either failed to find it from the list, or could not remove it successfully, which end up PD skipping free the memory of element in the label ole list. This stale memory will cause MRIB crash eventually during later on bundle back walk event. logs : This was due to multiple bundle member interface flapped & due to which PD failed to select member ports May 31 09:14:29.933 mrib/plat_lmrib_enc_rpf_err 0/RP0/CPU0 t3296 Bdl Walk, Seq #274, LMRIB LbL OLE WALK: failed select member of Bundle-Ether724(IFH:0xf60) on Prim path, label:29271, outinfo_handle:0x562efd595ef0, error: 0xb:Resource temporarily unavailable May 31 10:55:04.978 mrib/plat_lmrib_enc_rpf_err 0/RP0/CPU0 t23154 Unknown ctx, Seq #24, LMRIB LbL OLE WALK: failed select member of Bundle-Ether712(IFH:0xc60) on Prim path, label:29271, outinfo_handle:0x55f39200ff00, error: 0xb:Resource temporarily unavailable May 31 10:55:05.098 mrib/plat_lmrib_enc_rpf_err 0/RP0/CPU0 t23154 Bdl Walk, Seq #22, LMRIB LbL OLE WALK: failed select member of Bundle-Ether724(IFH:0xf60) on Prim path, label:29271, outinfo_handle:0x55f392010900, error: 0xb:Resource temporarily unavailable node: node0_RP0_CPU0 ------------------------------------------------------------------ Core location: 0/RP0/CPU0:/misc/disk1 Core for pid = 20227 (mrib) Core for process: mrib_20227.by.11.20210505-080903.xr-vm_node0_RP0_CPU0.507b0.core.gz Core dump time: 2021-05-05 08:09:03.741959171 +0530 Process: Core was generated by `mrib'. Build information: ### XR Information User = ahoang Host = iox-ucs-021 Workspace = /auto/srcarchive13/prod/7.1.2/asr9k-x64/ws Lineup = r71x.lu%EFR-00000409450 XR version = 7.1.2 ### Thirdparty Information SDK x86_64 /auto/exr-yocto/SDK/WRL7/EXR/satori-r71x/REL0004/kvm-host-x86_64-sdk.tgz Refpoint = thirdparty/opensource/release@tp-main/289 Hostname : calcium-99.cisco.com Workspace : /nobackup/hetsoi/satori-r71x.release.20200630/target-exr-gdb Source Base : ssh://wwwin-git-sjc-2/git/thinstack/satori.git Devline : satori-r71x Devline Ver : 11cad1a16680ffe684cc4d2e1b56d0b2f4fcaa0c Devline Type : GIT Repository ### Calvados Information for architecture Refpoint = calvados/release@r71x/3 Built By : ahoang Built On : Sat Aug 29 12:49:35 PDT 2020 Build Host : iox-ucs-021 Workspace : /auto/srcarchive13/prod/7.1.2/asr9k-x64/ws Source Base : ios_ena Wkspc Type : non-monolith Devline : r71x.lu%EFR-00000409450 r71x EFR-00000409450 Project xr-dev EFR-00000389352 Lineup Signal information: Program terminated with signal 11, Segmentation fault. Faulting thread: 20227 Registers for Thread 20227 rax: 0x7ffeea241520 rbx: 0x400000002000128 rcx: 0x7ffeea241408 rdx: 0x7ffeea241520 rsi: 0x3 rdi: 0x400000002000128 rbp: 0x7ffeea2413e0 rsp: 0x7ffeea2412f0 r8: 0x370001 r9: 0x409a0600 r10: 0x76a528b8 r11: 0xffffffff r12: 0x3 r13: 0x7ffeea241520 r14: 0x400000002000128 r15: 0x400000002000128 rip: 0x7f0bd8da91b5 eflags: 0x10202 cs: 0x33 ss: 0x2b ds: 0x0 es: 0x0 fs: 0x0 gs: 0x0 Backtrace for Thread 20227 #0 0x00007f0bd8da91b5 in ?? () from /opt/cisco/XR/packages/asr9k-mcast-x64-2.0.0.0-r712/rp/lib/libmrib4_platform.so #1 0x00007f0bd8daaa2d in vkg_mrib_ref_remove+0x32 from /opt/cisco/XR/packages/asr9k-mcast-x64-2.0.0.0-r712/rp/lib/libmrib4_platform.so #2 0x00007f0bd8dacedb in vkg_mrib_idb_ole_del+0x38e from /opt/cisco/XR/packages/asr9k-mcast-x64-2.0.0.0-r712/rp/lib/libmrib4_platform.so #3 0x00007f0bd8d7a2c1 in vkg_lmrib_lbl_bndl_ole_del+0x4f from /opt/cisco/XR/packages/asr9k-mcast-x64-2.0.0.0-r712/rp/lib/libmrib4_platform.so #4 0x00007f0bd8d7abe8 in lmrib_platform_oi_gone+0x915 from /opt/cisco/XR/packages/asr9k-mcast-x64-2.0.0.0-r712/rp/lib/libmrib4_platform.so #5 0x000055ca244acf02 in ?? () from /opt/cisco/XR/packages/asr9k-mcast-x64-2.0.0.0-r712/rp/bin/mrib #6 0x00007f0bda781376 in rn_delete+0x3db from /opt/cisco/XR/packages/asr9k-iosxr-os-64-2.0.0.0-r712/all/lib/libios.so #7 0x000055ca244ad60f in ?? () from /opt/cisco/XR/packages/asr9k-mcast-x64-2.0.0.0-r712/rp/bin/mrib #8 0x000055ca244b50ee in ?? () from /opt/cisco/XR/packages/asr9k-mcast-x64-2.0.0.0-r712/rp/bin/mrib #9 0x000055ca244c0a06 in ?? () from /opt/cisco/XR/packages/asr9k-mcast-x64-2.0.0.0-r712/rp/bin/mrib #10 0x000055ca244c1121 in ?? () from /opt/cisco/XR/packages/asr9k-mcast-x64-2.0.0.0-r712/rp/bin/mrib #11 0x000055ca244c2f6d in ?? () from /opt/cisco/XR/packages/asr9k-mcast-x64-2.0.0.0-r712/rp/bin/mrib #12 0x00007f0bdb03c4ef in ?? () from /opt/cisco/XR/packages/asr9k-iosxr-os-64-2.0.0.0-r712/all/lib/libinfra.so #13 0x00007f0bdb04a244 in xr_event_dispatch+0x69 from /opt/cisco/XR/packages/asr9k-iosxr-os-64-2.0.0.0-r712/all/lib/libinfra.so #14 0x000055ca2446eaa0 in main+0x15c0 from /opt/cisco/XR/packages/asr9k-mcast-x64-2.0.0.0-r712/rp/bin/mrib #15 0x00007f0bd39a6170 in __libc_start_main+0xf0 from /lib64/libc-2.20.so #16 0x000055ca2446f671 in ?? () from /opt/cisco/XR/packages/asr9k-mcast-x64-2.0.0.0-r712/rp/bin/mrib