...
Some traffic for particular VLANs failing destined to a vPC peers SVI IP when it is hashed into local device and should cross the peer-link to reach the peer. Peers SVI MAC for affected VLAN will be pointing out an orphan port where it may have been previously learned. "show ip arp detail" will show the old orphan port interface for the ARP entry instead of the peer-link where it should point: N3K-1# show ip arp detail Flags: * - Adjacencies learnt on non-active FHRP router + - Adjacencies synced via CFSoE # - Adjacencies Throttled for Glean IP ARP Table for context default Total number of entries: 3 Address Age MAC Address Interface Physical Interface 10.51.16.252 00:01:05 6073.5c62.6781 Vlan151 Ethernet1/13
Issue was seen on ISSU when one of the vPC devices was coming back up, could also be seen on reload. The peer SVI MAC is learned on a NON vpc peer-link interface (orphan port) BEFORE the peer-link itself comes up. This can happen during device bringup and broadcast ARPs from the peers SVI MAC is flooded downstream and makes it back up to this orphan port that populates the ARP entry. After the peer-link comes back up, the adjacency entry stays pointed out of the old orphan port indefinitely and is not correctly re-learnt on the peer-link.
"clear mac address-table dynamic" for the peer SVI MAC is known to fix the issue by clearing the FWM entry for the MAC so it can no longer incorrectly re-learn on the orphan port and will cause it to be correctly learned on the peer-link.
What we see happening is for a handful of VLANs for a peers SVI MAC, we learn on an orphan port before the vPC peer-link comes up, this occurred during ISSU and could also occur on a regular reload. The peer-link then comes up, and when the ARP fully expires and no longer shows up in adjacency tables or the ARP table, the affected N3K sends a broadcast ARP that makes it across the peer-link to the peer N3K, who then sends the ARP response back across the peer-link. This causes the ARP entry to "populate" with the correct MAC, however it appears as if fwm/l2fm is sending a MAC Move notification to the AM process that causes it to switch the adjacency to the old orphan port where it was initially learned on instead of staying pointed at the peer link. This causes the black holed traffic and can cause a large outage when services such as DHCP Relay are used. # show platform fwm info mac XXXX.XXXX.XXXX [VLAN ID] The above command will show the MAC history for the peers SVI MAC and where we "hw learn" the MAC. Taking the interface index and grepping for it with "show interface snmp-ifindex | grep 0xVALUE" will show the interface of where we learned the MAC when the peer-link was down and where the stale ARP entry points to.