Symptom
Any Broadcast Unknown Unicat Multicast (BUM) pkt will be looped between the 2 switches in vPC Fabric Peering until interface's bandwidth is exhausted, which will result in loss of connectivity to all the devices to the downstream VTEP as well as other VTEPs connected to the same SPINE
N9K-2# show int eth 1/50 | in rate
30 seconds input rate 85075875944 bits/sec, 93284949 packets/sec
30 seconds output rate 85076567184 bits/sec, 93285704 packets/sec
input rate 85.07 Gbps, 93.28 Mpps; output rate 85.08 Gbps, 93.29 Mpps
300 seconds input rate 44626529008 bits/sec, 48932561 packets/sec
300 seconds output rate 44626617952 bits/sec, 48932654 packets/sec
input rate 44.63 Gbps, 48.93 Mpps; output rate 44.63 Gbps, 48.93 Mpps
N9K-2#
!
Conditions
This issue can happen only on FX3 devices from 10.2.x release and on GX/GX2/HX devices in all releases (below the ones marked as fixed).
EX, FX, FX2 devices are not affected
Having PIM BiDir as multicast underlay & VPC Fabric Peering enabled. VNIs must be using Multicast for BUM replication.
ip pim rp-address 10.96.7.241 group-list 239.0.1.0/26 bidir
!
vpc domain 165
peer-switch
role priority 200
peer-keepalive destination 10.122.163.243
virtual peer-link destination 10.96.0.165 source 10.96.0.166 dscp 56
peer-gateway
auto-recovery
fast-convergence
ipv6 nd synchronize
ip arp synchronize
!
interface nve1
member vni 200044
mcast-group 239.0.1.44
Workaround
Remove PIM BiDir and use PIM ASM or Ingress replication
no ip pim rp-address 10.96.7.241 group-list 239.0.1.0/26 bidir
ip pim rp-address 10.96.7.241 group-list 239.0.1.0/26