...
A Cat9k switch deployed in a Fabric (Such as EVPN-VXLAN) may corrupt packets over a certain size when those frames are processed in software. As a result of this corruption, there may be a number of different user reported symptoms depending on what traffic is corrupted. Downstream devices will see either incorrect checksums or part of the data zero'ed out (all 0's). To date, the primary protocols that have been impacted are DHCP and mDNS, but others may be impacted as well.
There are multiple conditions required to see this issue:. 1. This was first seen on VXLAN enabled Cat9k switches that are running either 16.12.4 or 17.3.1 or earlier, but other software may be impacted as well. This defect is not known to impact non-fabric enabled devices. 2. Impacted traffic must come in a local interface, and be destined out an local interface without touching the fabric overlay. For example: - Traffic ingressing a dot1q trunk on vlan 100, and routed locally to a trunk on vlan 200 will be impacted - Traffic ingressing a dot1q trunk on vlan 100, and routed over VXLAN (either staying in vlan 100 or routed to a new vlan) will NOT see this problem. - Traffic ingressing from VXLAN and destined our a local trunk or other interface (either L2 extended from VXLAN, or routed from VXLAN) will NOT see this problem. 3. Traffic must be processed in software For example: A Cat9k configured as a DHCP relay (ip helper) will process the relay action at the CPU, so a broadcast DHCP discover will go up, but the relayed packet would see a large number of 0's overwritten in the relayed packet data. It would appear as if some options were missing, or the option 255 (end) is missing and was overwritten by '00' padding. If you are unsure if your traffic is forwarded in software, you can use the embedded monitor capture feature to collect a sample of traffic going to or from the CPU of the switch. Example config: - monitor capture cpu control-plane out match - monitor capture cpu start (wait or send impacted traffic) - monitor capture cpu stop - show monitor capture cpu buffer brief
This is specific to locally forwarded traffic, so if possible routing can be changed to force the impacted traffic to be sent across the VXLAN fabric. Otherwise, no workarounds are available.
The threshold for corruption appears to be around 384 Bytes. Packets this size or later will see corruption 100% of the time, but smaller frames are not expected to be impacted.