...
This issue is not typical to hit in production. The trigger and root-cause are unknown, however from high-level standpoint this issue looks like: ==> example for module#1 - the same picture is observed with two different interfaces on module#2 Nexus7K# show interface ethernet 1/19 Ethernet1/19 is down (SDP timeout/SFP Mismatch) Nexus7K# show interface ethernet 1/20 Ethernet1/20 is down (SDP timeout/SFP Mismatch) Some fabric interfaces might go "down" because of "SDP timeout/SFP Mismatch". This behavior is explained by the observation, that neither SDP frame can reach satmgr process. All SDP frames are being dropped in XBR-INTF: |------------------------------------------------------------------------| | Device:Flanker Xbar Driver Role:XBR-INTF Mod: 1 | | Last cleared @ Fri Dec 14 13:43:53 2018 | Device Statistics Category :: ERROR |------------------------------------------------------------------------| Instance:7 Cntr Name Value Ports ------ ---- ----- ----- 11153 igr-in FT0: ib to ft0 error packet count 0000000000000481 57-64 - 11159 igr-in FT1: ib to ft1 error packet count 0000000000000499 57-64 - Also you should see correlation and the reason for PDU drops in Flanker Queue Driver: |------------------------------------------------------------------------| | Device:Flanker Queue Driver Role:QUE Mod: 1 | | Last cleared @ Fri Dec 14 13:43:53 2018 | Device Statistics Category :: ERROR |------------------------------------------------------------------------| Instance:7 Cntr Name Value Ports ------ ---- ----- ----- 16512 [intr] ib: rr0 pkt buffer read crc error 0000000000000314 57-64 - 16514 [intr] ib: rr0 corrupt crc (src:pl giant 0000000000000314 57-64 - drops, trunc drops) 16544 [intr] ib: rr1 pkt buffer read crc error 0000000000000324 57-64 - 16546 [intr] ib: rr1 corrupt crc (src:pl giant 0000000000000324 57-64 - drops, trunc drops)
Exact conditions are unknown so far. However in our case trigger was asserted from a single FEX, and few fabric links were affected for that individual FEX only. All other FEXes were working well even despite the fact, that fabric links for some FEXes (both working and not-working) were terminated on a single ASIC instance.
The module(s) were affected fabric links are terminated, should be reloaded.
None