Symptom
High CPU at CENT-BR-0 IOMd process will cause huge memory usage at CENT-BR-0 when ROUTE DOWN event is not handled properly leading to a crash.
Per the crashinfo, CENT_BR_MSG_BUF and CENT_IPC_MSG_CH occupied most of the memory.
%DOMAIN-5-TC_PATH_CHG: Traffic class Path Changed. Details: Instance=0: VRF=default: Source Site ID=: Destination Site ID=: Reason=Unreachable: TCA-ID=: Policy Violated=None: TC=[Site id=, TC ID=, Site prefix=1, DSCP=default(0), App ID=0]: Original Exit=[CHAN-ID=11, BR-IP=, DSCP=default[0], Interface=Tunnelx Path=INET[label=0:2 | 0:0 [0x20000]]]: New Exit=[CHAN-ID=1, BR-IP=, DSCP=default[0], Interface=Tunnelx, Path=MPLS[label=0:1 | 0:0 [0x10000]]]
Aug 27 12:14:58.690 EDT: %SYS-2-MALLOCFAIL: Memory allocation of bytes failed from alignment 8
Pool: Processor Free: Cause: Memory fragmentation
Alternate Pool: None Free: 0 Cause: No Alternate pool
-Process=, ipl= 0, pid=
Conditions
If you have all of following conditions, you may hit this issue:
1, No summary route (covering all sites' id) but default route to HUB on branch;
2, No detailed route filtering to branch(branch has all sites' detail routes);
3, Spoke-to-Spoke traffic;
4, Scaling(not high, less than 100 sites might trigger this issue);
5, WAN down/up ;
Workaround
Any one of following operations could avoid this issue:
1, Configure summary route (covering all sites' id) to HUB on branch;
2, Disable branch-branch traffic optimization via following CLI:
router(config-domain-vrf)#master branch
router(config-domain-vrf-mc)#no branch-to-branch
3, Filter detailed route to branch;
4, Downgrade to 16.6.3 or 16.3.6;
Further Problem Description