Symptom
High CPU on Cisco Nexus 9000 running VXLAN due to high IPFIB. NX-OS show process cpu may indicate IPFIB consistently between 10-20% of CPU utilization.
# show processes cpu | inc ig ipfib
804 1240 492 0 0.00% ipfib
2439 51140 3250 0 12.50% ipfib >>>>
Linux bash top command will show ipfib consistently on 99-100% CPU utilization:
bash-4.3# top
top - 11:37:20 up 5 min, 2 users, load average: 2.32, 1.95, 0.94
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2439 root -2 0 756204 170084 37868 R 99.7 0.7 1:39.53 ipfib
Delay on terminal printing outputs/dumping show techs. Show techs like show tech vxlan-evpn, show tech forwarding l3 unicast may not complete.
Unable to shutdown NVE interface or make changes to the NVE config
Conditions
Too many discontiguous VLANs with Ingress Replication.
VXLAN with Ingress replication enabled for VLANs/VNIs.
NX-OS release 9.3.x
Workaround
Make the VLAN list/range contiguous as much as possible on Remote Peers and then reload the switch facing this high CPU issue.
Further Problem Description
IPFIB receives the VLAN list with ingress replication enabled for a given VXLAN/NVE peer. This list is passed on as string. Due to dis-contiguous VLANs the no. of characters (including comma (,) and hyphen (-) may exceed 1999 count to hit the problem.
Issue is resolved in 9.3(4) and later releases.