...
Catalyst 9400 linecards may report a "faulty" status due to failing diagnostics, resulting in failing interfaces staying in a down/down err-disabled state. Issue experienced occurred on Catalyst 9400s, which resulted in linecards to be incorrectly flagged as "faulty" in the output of "show module". Below is an example of output that indicates part of the issue. Note the "faulty" status for the non-supervisor linecards under the Status column. 9400#sho mod Mod MAC addresses Hw Fw Sw Status ---+--------------------------------+----+------------+------------------+-------- 1 6CB2.AE42.97FC to 6CB2.AE42.982B 1.0 16.6.2r[FC1] 16.08.01a faulty <-- One or more modules may be impacted 2 707D.B9C8.E7FC to 707D.B9C8.E82B 1.0 16.6.2r[FC1] 16.08.01a faulty 3 707D.B9C8.D92C to 707D.B9C8.D95B 1.0 16.6.2r[FC1] 16.08.01a faulty 4 707D.B9C8.FA80 to 707D.B9C8.FAAF 1.0 16.6.2r[FC1] 16.08.01a faulty 5 00BE.758D.76AC to 00BE.758D.76B5 1.0 16.6.2r[FC1] 16.08.01a ok 6 00BE.758D.76B6 to 00BE.758D.76BF 1.0 16.6.2r[FC1] 16.08.01a ok 7 707D.B9C8.E70C to 707D.B9C8.E73B 1.0 16.6.2r[FC1] 16.08.01a faulty The cause of the "faulty" status should be identified. Run "show post" In this scenario, POST: PHY Loopback failed for multiple interfaces is a common failure scenario for this defect: 9400#show post Stored system POST messages: Switch C9410R -------------- POST: MBIST Tests : Begin POST: MBIST Tests : End, Status Passed POST: Module: 5 PHY Loopback: loopback Test: Begin POST: Module: 5 PHY Loopback: loopback Test: End, Status Passed POST: Module: 6 PHY Loopback: loopback Test: Begin POST: Module: 6 PHY Loopback: loopback Test: End, Status Passed POST: Module: 1 PHY Loopback: loopback Test: Begin POST: PHY Loopback: Failed For Interface : GigabitEthernet1/0/1 POST: PHY Loopback: Failed For Interface : GigabitEthernet1/0/4 POST: PHY Loopback: Failed For Interface : GigabitEthernet1/0/5 Finally, Generic Online Diagnostic (GOLD) packets need to be dropped and actively increment during diagnostics to be able to 100% positively match this issue. To see if GOLD packets are being dropped, run "show platform hardware fed active qos queue stats internal cpu policer" and look for queue-31 or "Gold Pkt". The second column represents a policer index. Some classes of traffic may share a common policer index, meaning drops in one of those classes may cause the GOLD Diagnostics to fail. In this scenario, the GOLD packets are being dropped because of a large number of RPF failures and the CPU policer is taking action and dropping the multicast traffic that is recording an RPF failure as well as the GOLD packets. Note that it does not need to be specifically RPF Failures that can also lead to GOLD packets to be dropped, anything that maps to PlcIdx 10 can cause the issue. 9400#sh plat hard fed active qos queue stats internal cpu policer CPU Queue Statistics ============================================================================================ (default) (set) Queue Queue QId PlcIdx Queue Name Enabled Rate Rate Drop(Bytes) Drop(Frames) -------------------------------------------------------------------------------------------- 0 11 DOT1X Auth Yes 1000 1000 0 0 1 1 L2 Control Yes 2000 2000 0 0 2 14 Forus traffic Yes 4000 4000 0 0 3 0 ICMP GEN Yes 600 600 0 0 4 2 Routing Control Yes 5400 5400 0 0 5 14 Forus Address resolution Yes 4000 4000 0 0 6 0 ICMP Redirect Yes 600 600 0 0 7 16 Inter FED Traffic Yes 2000 2000 0 0 8 4 L2 LVX Cont Pack Yes 1000 1000 0 0 9 16 EWLC Control Yes 2000 2000 0 0 10 16 EWLC Data Yes 2000 2000 0 0 11 13 L2 LVX Data Pack Yes 1000 1000 0 0 12 0 BROADCAST Yes 600 600 0 0 13 10 Learning cache ovfl Yes 100 200 0 0 14 13 Sw forwarding Yes 1000 1000 0 0 15 8 Topology Control Yes 13000 13000 0 0 16 12 Proto Snooping Yes 2000 2000 0 0 17 6 DHCP Snooping Yes 500 400 0 0 18 9 Transit Traffic Yes 500 400 0 0 19 10 RPF Failed Yes 100 200 8464833733 6226004 <--- Last two columns represent drops. This is queue 19, policer index 10. 20 15 MCAST END STATION Yes 2000 2000 0 0 21 13 LOGGING Yes 1000 1000 0 0 22 7 Punt Webauth Yes 1000 1000 0 0 23 10 Crypto Control Yes 100 200 0 0 24 10 Exception Yes 100 200 0 0 25 3 General Punt Yes 200 200 0 0 26 10 NFL SAMPLED DATA Yes 100 200 0 0 27 2 Low Latency Yes 5400 5400 0 0 28 10 EGR Exception Yes 100 200 0 0 29 5 Stackwise Virtual Control Yes 8000 8000 0 0 30 9 MCAST Data Yes 500 400 0 0 31 10 Gold Pkt Yes 100 200 36040 530 <-- Last two columns represent drops for Diagnostic traffic. Queue 31, policer index 10.
This issue may be seen on any Cat9400 linecard when there is excessive CPU traffic matching certain classes of traffic that cause the diagnostics to fail as a result of exceeding a shared policer.
Address any traffic that is hitting the Learning cache ovfl/Crypto Control/Exception/EGR Exception/NFL SAMPLED DATA/Gold Pkt/RPF Failed/ queue, and then OIR the affected linecards. Valid methods include a physical reseat of the linecard, CLI OIR, or a reload of the switch to re-start diagnostics. The command to OIR via CLI is the following: 9400#hw-module subslot oir power-cycle If you are unsure of what traffic is hitting the CPU of the switch and causing the policer to increment, you can collect a sample of all traffic hitting the CPU and identify if any flows do not belong or would match a description of one of these policers. Cat9400#mon cap capture control-plane in match any limit packets 2500 <-- Catches first 2500 packets (IP and non-IP) to hit the CPU of the switch. Cat9400#mon cap capture start Enabling Control plane capture may seriously impact system performance. Do you want to continue? [yes/no]: y Started capture point : capture Cat9400#mon cap capture stop Cat9400#show mon cap capture buffer brief Starting the packet display ........ Press Ctrl + Shift + 6 to exit Cat9400#show mon cap capture buffer detail
Click on a version to see all relevant bugs
Cisco Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.