...
Messages are seen continuously on router with following signature : Aug 3 13:55:12.519 bst: %SYS-4-CHUNKSIBLINGSEXCEED: Number of siblings in a chunk has gone above the threshold. Threshold:10000 Sibling-Count:12715 Chunk:0x7F66A1483540 Name:RSVP Sync mem -Process= "LSP Tunnel Head Control", ipl= 2, pid= 573 -Traceback= 1#eef6dfd9f79fff4694160ffacf6a21cb :7F67BEB5A000+89F2BEA :7F67BEB5A000+AAC250E :7F67BEB5A000+AABEE83 :7F67BEB5A000+AABE01F :7F67BEB5A000+AABD560 :7F67BEB5A000+865644C :7F67BEB5A000+85EFF80 :7F67BEB5A000+85EF8A4 :7F67BEB5A000+78B6311 :7F67BEB5A000+77A6C01 :7F67BEB5A000+77703C0 :7F67BEB5A000+776F5B3 :7F67BEB5A000+776D44E :7F67BEB5A000+776B4E9 :7F67BEB5A000+77ABE08 :7F67BEB5A000+77AB3B0 Traceback summary ----------------- % 0x7f67c754cbea : __be_chunk_malloc % 0x7f67c961c50e : __be_rsvp_sync_ha_alloc_mem % 0x7f67c9618e83 : __be_rsvp_sync_ha_convert_session_to_tlv % 0x7f67c961801f : __be_rsvp_sync_ha_incr_enqueue_session % 0x7f67c9617560 : __be_rsvp_sync_ha_enqueue_db % 0x7f67c71b044c : __be_rsvp_proxy_path % 0x7f67c7149f80 : __be_rsvp_edit_ip_path % 0x7f67c71498a4 : __be_rsvp_add_ip_path_api % 0x7f67c6410311 : __be_te_ext_sig_ipv4_path_addmodify % 0x7f67c6300c01 : __be_tspts_initiate_rsvp_head_setup..0
The standby RP is in ?standby cold? state and rsvp tries to run in NSR mode. Trying to sync its memory with its peer on the standby RP but failing to do so. PE1-SHY-AP#show running-config | b redundancy redundancy mode rpr PE1-SHY-AP#sh ip rsvp nsr counters Active Sync state - Bulk Syncing Bulk sync state - None Queue size: Bulk: 1 Incremental: 4745391 <<<<<<< Does this mean the queue never goes to 0 because checkpoint cannot sync ? Bulk Sync was initiated: 0 Last bulk sync was successful (entries sent: 0) initiated: 0 Resumed: 0 IPC Flow Events Flow Control Was on: 0 Flow Control Was off: 0 Peer counters Was Ready: 0 Was Not Ready: 0 RF Events Standby Cold: 0 Prog Standby Bulk: 0 Send timer Started: 0 Expired: 0 Timer off Sync Process Messages Total Sent: 0 Bulk Sync: 0 Incr Sync: 0 Incr Sync when queue empty: 1 Checkpoint Messages Sent (Items) Succeeded: 0 (0) Acks accepted:0 (0) Acks ignored: (0) Nacks: 0 (0) Failed: 1 (1) Buffer alloc: 0 Buffer freed: 0 Incr Enqueues: 4745391 Buffer highwater: 0 Send message deferrals: 0 ISSU: Checkpoint Messages Transformed: On Send: Succeeded: 0 Failed: 0 Transformations: 0 On Recv: Succeeded: 0 Failed: 0 Transformations: 0 Negotiation: Started: 0 Finished: 0 Failed to Start: 0 Messages: Sent: Send succeeded: 0 Send failed: 0 Buffer allocated: 0 Buffer freed: 0 Buffer alloc failed: 0 Received: Succeeded: 0 Failed: 0 Buffer freed: 0 Init: Succeeded: 0 Failed: 0 Session Registration: Succeeded: 0 Failed: 0 Session Unregistration: Succeeded: 0 Failed: 0 Errors: CF buf alloc: 1 Sync memory usage: (blk size: allocs/failures/frees) 64 : 3715621/0/514890 256 : 1029774/0/514886 1024 : 1029772/0/0 I don't think it is normal that the frees is much lower than the allocs and actually that frees = 0 for last pool. PE1-SHY-AP#sh ip rsvp nsr counters | in 64|256|1024|Time Time source is NTP, 08:30:05.846 bst Tue Sep 8 2015 Timer off 64 : 3715881/0/514925 256 : 1029844/0/514921 1024 : 1029842/0/0 PE1-SHY-AP#sh ip rsvp nsr counters | in 64|256|1024|Time source Time source is NTP, 08:30:09.896 bst Tue Sep 8 2015 64 : 3715884/0/514926 256 : 1029846/0/514922 1024 : 1029844/0/0 PE1-SHY-AP#sh ip rsvp nsr counters | in 64|256|1024|Time source Time source is NTP, 08:30:15.643 bst Tue Sep 8 2015 64 : 3715889/0/514926 256 : 1029846/0/514922 1024 : 1029844/0/0 PE1-SHY-AP#sh ip rsvp nsr counters | in 64|256|1024|Time source Time source is NTP, 08:30:33.706 bst Tue Sep 8 2015 64 : 3715903/0/514928 256 : 1029850/0/514924 1024 : 1029848/0/0 PE1-SHY-AP#sh ip rsvp nsr counters | in 64|256|1024|Time source Time source is NTP, 08:30:39.487 bst Tue Sep 8 2015 64 : 3715903/0/514928 256 : 1029850/0/514924 1024 : 1029848/0/0 PE1-SHY-AP#sh ip rsvp nsr counters | in 64|256|1024|Time source Time source is NTP, 08:30:48.894 bst Tue Sep 8 2015 64 : 3715911/0/514929 256 : 1029852/0/514925 1024 : 1029850/0/0 PE1-SHY-AP#show ip rsvp nsr counters | in 64|256|1024|Time so Time source is NTP, 08:39:33.315 bst Tue Sep 8 2015 64 : 3716289/0/514982 256 : 1029958/0/514978 1024 : 1029956/0/0 As a result of this RSVP Sync mem is creating an excessive amount of Chunk Sibling. PE1-SHY-AP#show chunk summ | in RSVP S Element Sibling size Total Total Total Inuse Ovrhd Chunk Flag size(b) --range(b)-- Siblg alloc Free HWM (b) name D 280 32776- 65486 4557 515054 69 514985 5159800 RSVP Sync mem D 88 10008- 16590 31383 3201438 100 3201338 32264240 RSVP Sync mem D 1048 65544- 65742 16884 1029985 19 1029966 27218628 RSVP Sync mem Above mentioned is the one corresponding to 88B (pool used for the 64B in the previous command). We see 31383 siblings as in the error message : Sep 8 08:43:10.811 bst: %SYS-4-CHUNKSIBLINGSEXCEED: Number of siblings in a chunk has gone above the threshold. Threshold:10000 Sibling-Count:31384 Chunk:0x7F66A1483540 Name:RSVP Sync mem -Process= "LSP Tunnel Head Control", ipl= 2, pid= 573 -Traceback= 1#eef6dfd9f79fff4694160ffacf6a21cb :7F67BEB5A000+89F2BEA :7F67BEB5A000+AAC250E :7F67BEB5A000+AABEE83 :7F67BEB5A000+AABD05D :7F67BEB5A000+8641AB3 :7F67BEB5A000+8658E11 :7F67BEB5A000+85F0A23 :7F67BEB5A000+85F03FF :7F67BEB5A000+78B6898 :7F67BEB5A000+77A7D10 :7F67BEB5A000+776BB10 :7F67BEB5A000+7777AA7 :7F67BEB5A000+776D7C1 :7F67BEB5A000+776CAC1 :7F67BEB5A000+776B4E9 :7F67BEB5A000+77ABE08 Same is true for the 1048B pool.
There is actually TE tunnel configured: interface Tunnel9010 description russtest ip unnumbered Loopback100 tunnel mode mpls traffic-eng tunnel destination 212.31.220.185 tunnel mpls traffic-eng path-option 10 explicit name russ-lsp-PE1-SHY verbatim tunnel mpls traffic-eng record-route ! And due to the 'verbatim' configuration, the tunnel will be repeatedly signalled even if path is not available... The traceback is coming from trying to sync to standby RP. Does it exist ? There is following cfg: redundancy mode sso mpls traffic-eng nsr ------------------ show redundancy states ------------------ my state = 13 -ACTIVE peer state = 1 -DISABLED Mode = Simplex Unit = Secondary Unit ID = 49 Redundancy Mode (Operational) = Non-redundant Redundancy Mode (Configured) = sso Redundancy State = Non Redundant Maintenance Mode = Disabled Manual Swact = disabled (system is simplex (no peer unit)) Communications = Down Reason: Simplex mode client count = 115 client_notification_TMR = 30000 milliseconds RF debug mask = 0x0 Can please try to remove the mpls nsr cfg - no mpls tr nsr Customer replied : ----------------------- I have removed the mpls traffic-eng nsr command from the PE as advised and I can confirm that it has stopped the continuous errors from being reported.
NSR has been enabled and part of client's configuration to achieve NSR functionality across all the protocols (MPLS TE NSR as well). Client has come back with an argument to suggest that there can be conditions in live network where STANDBY RP can be faulty or switched to RPR mode due to some other underlying condition. Does this messages and following condition suggest there could be a memory leak to due to continuous synch (failed ones) to Standby RP. Some of previously mentioned analysis from customer suggest there might be. Cisco BU Reply : Customer has hit a corner case. Issue will not happen if standby RP exists initially, and then goes down. We have just done re-test to confirm this behaviour. Issue occurs only when 'mpls tr nsr' is configured without a standby RP present. Yes, its a bug, as it causes memory usage to accumulate. (but its not a memory leak — the memory is freed/reclaimed once nsr cfg is removed). Follow-up question by client side : You suggested that memory usage seems to be increasing , the customer this morning has cleared the issue by removing and reconfiguring 'mpls-te nsr ' configuration. Can this be treated as a recommended workaround to release all the accumulated memory . Cisco BU reply : Removing and reconfiguring 'mpls tr nsr' will release the accumulatd memory. But just removing 'mpls tr nsr' (and not adding it back) will prevent the memory to accumulate in the first place. Just need to remember to add it back when adding a standby RP.