BugZero | Cisco BugID CSCvj94357 - Catalyst 9400 Line card may go to 'Faulty' status ...

Cisco - Defect ID: CSCvj94357

Catalyst 9400 Line card may go to 'Faulty' status after reload.

Cisco - Defect ID: CSCvj94357

Catalyst 9400 Line card may go to 'Faulty' status after reload.

Last updated on March 6th, 2025

BugZero Risk Score
7.9 High

Overall: 7.9

Severity: 8.2

Lifecycle: 9.1

Popularity: 6.0

What is the BugZero Risk Score?

Cisco Integration

Learn more about where this data comes from

Cisco Integration

Learn more

Bug Scrub Advisor

Streamline upgrades with automated vendor bug scrubs

Bug Scrub Advisor

Learn more

BugZero Enterprise

Wish you caught this bug sooner? Get proactive today.

BugZero Enterprise

Learn more

Bug Details

Description

Symptom

Catalyst 9400 linecards Issue experienced 9400#sho mod Mod MAC addresses ---+---------------------- 1 6CB2.AE42.97FC 2 707D.B9C8.E7FC 3 707D.B9C8.D92C 4 707D.B9C8.FA80 5 00BE.758D.76AC 6 00BE.758D.76B6 7 707D.B9C8.E70C The cause of the 9400#show post Stored system POST messages: Switch C9410R -------------- POST: MBIST Tests : Begin POST: MBIST Tests : End, Status Passed POST: Module: 5 PHY POST: Module: 5 PHY POST: Module: 6 PHY POST: Module: 6 PHY POST: Module: 1 PHY POST: PHY Loopback: POST: PHY Loopback: POST: PHY Loopback: Finally, Generic To see if GOLD packets In this scenario, 9400#sh plat hard ========================== QId PlcIdx Queue Name -------------------------- 0 11 DOT1X Auth 1 1 L2 Control 2 14 Forus traffic 3 0 ICMP GEN 4 2 Routing Control 5 14 Forus Address resolution 6 0 ICMP Redirect 7 16 Inter FED Traffic 8 4 L2 LVX Cont Pack 9 16 EWLC Control 10 16 EWLC Data 11 13 L2 LVX Data Pack 12 0 BROADCAST 13 10 Learning cache ovfl 14 13 Sw forwarding 15 8 Topology Control 16 12 Proto Snooping 17 6 DHCP Snooping 18 9 Transit Traffic 19 10 RPF Failed 20 15 MCAST END STATION 21 13 LOGGING 22 7 Punt Webauth 23 10 Crypto Control 24 10 Exception 25 3 General Punt 26 10 NFL SAMPLED DATA 27 2 Low Latency 28 10 EGR Exception 29 5 Stackwise Virtual Control 30 9 MCAST Data 31 10 Gold Pkt may report a "faulty" status due to failing diagnostics, resulting in failing interfaces staying in a down/down err-disabled state. occurred on Catalyst 9400s, which resulted in linecards to be incorrectly flagged as "faulty" in the output of "show module". Below is an example of output that indicates part of the issue. Note the "faulty" status for the non-supervisor linecards under the Status column. Hw Fw Sw Status ----------+----+------------+------------------+-------- to 6CB2.AE42.982B 1.0 16.6.2r[FC1] 16.08.01a faulty <-- One or more modules may be impacted to 707D.B9C8.E82B 1.0 16.6.2r[FC1] 16.08.01a faulty to 707D.B9C8.D95B 1.0 16.6.2r[FC1] 16.08.01a faulty to 707D.B9C8.FAAF 1.0 16.6.2r[FC1] 16.08.01a faulty to 00BE.758D.76B5 1.0 16.6.2r[FC1] 16.08.01a ok to 00BE.758D.76BF 1.0 16.6.2r[FC1] 16.08.01a ok to 707D.B9C8.E73B 1.0 16.6.2r[FC1] 16.08.01a faulty "faulty" status should be identified. Run "show post" In this scenario, POST: PHY Loopback failed for multiple interfaces is a common failure scenario for this defect: Loopback: loopback Test: Begin Loopback: loopback Test: End, Status Passed Loopback: loopback Test: Begin Loopback: loopback Test: End, Status Passed Loopback: loopback Test: Begin Failed For Interface : GigabitEthernet1/0/1 Failed For Interface : GigabitEthernet1/0/4 Failed For Interface : GigabitEthernet1/0/5 Online Diagnostic (GOLD) packets need to be dropped and actively increment during diagnostics to be able to 100% positively match this issue. are being dropped, run "show platform hardware fed active qos queue stats internal cpu policer" and look for queue-31 or "Gold Pkt". The second column represents a policer index. Some classes of traffic may share a common policer index, meaning drops in one of those classes may cause the GOLD Diagnostics to fail. the GOLD packets are being dropped because of a large number of RPF failures and the CPU policer is taking action and dropping the multicast traffic that is recording an RPF failure as well as the GOLD packets. Note that it does not need to be specifically RPF Failures that can also lead to GOLD packets to be dropped, anything that maps to PlcIdx 10 can cause the issue. fed active qos queue stats internal cpu policer CPU Queue Statistics ================================================================== (default) (set) Queue Queue Enabled Rate Rate Drop(Bytes) Drop(Frames) ------------------------------------------------------------------ Yes 1000 1000 0 0 Yes 2000 2000 0 0 Yes 4000 4000 0 0 Yes 600 600 0 0 Yes 5400 5400 0 0 Yes 4000 4000 0 0 Yes 600 600 0 0 Yes 2000 2000 0 0 Yes 1000 1000 0 0 Yes 2000 2000 0 0 Yes 2000 2000 0 0 Yes 1000 1000 0 0 Yes 600 600 0 0 Yes 100 200 0 0 Yes 1000 1000 0 0 Yes 13000 13000 0 0 Yes 2000 2000 0 0 Yes 500 400 0 0 Yes 500 400 0 0 Yes 100 200 8464833733 6226004 <--- Last two columns represent drops. This is queue 19, policer index 10. Yes 2000 2000 0 0 Yes 1000 1000 0 0 Yes 1000 1000 0 0 Yes 100 200 0 0 Yes 100 200 0 0 Yes 200 200 0 0 Yes 100 200 0 0 Yes 5400 5400 0 0 Yes 100 200 0 0 Yes 8000 8000 0 0 Yes 500 400 0 0 Yes 100 200 36040 530 <-- Last two columns represent drops for Diagnostic traffic. Queue 31, policer index 10.

Conditions

This issue may be seen on any Cat9400 linecard when there is excessive CPU traffic matching certain classes of traffic that cause the diagnostics to fail as a result of exceeding a shared policer.

Workaround

Address any traffic that is hitting the Learning cache ovfl/Crypto Control/Exception/EGR Exception/NFL SAMPLED DATA/Gold Pkt/RPF Failed/ queue, and then OIR the affected linecards. Valid methods include a physical reseat of the linecard, CLI OIR, or a reload of the switch to re-start diagnostics. The command to OIR via CLI is the following: 9400#hw-module subslot oir power-cycle If you are unsure of what traffic is hitting the CPU of the switch and causing the policer to increment, you can collect a sample of all traffic hitting the CPU and identify if any flows do not belong or would match a description of one of these policers. Cat9400#mon cap capture control-plane in match any limit packets 2500 <-- Catches first 2500 packets (IP and non-IP) to hit the CPU of the switch. Cat9400#mon cap capture start Enabling Control plane capture may seriously impact system performance. Do you want to continue? [yes/no]: y Started capture point : capture Cat9400#mon cap capture stop Cat9400#show mon cap capture buffer brief Starting the packet display ........ Press Ctrl + Shift + 6 to exit Cat9400#show mon cap capture buffer detail

Further Problem Description

Change history

No changes to display

Top Cisco Defects by Risk Score

9.65Defect ID: CSCwd45843
Auth Step latency for policy evaluation due to Garbage Collection activity.
9.65Defect ID: CSCwa92734
CUBE DTMF interworking fails from rtp-nte to OOB SIP methods
9.65Defect ID: CSCwj45822
Cisco ASA and FTD Software Remote Access VPN Brute Force Denial of Service Vulnerability
9.65Defect ID: CSCvo03458
PKI "revocation check crl none" does not fallback if CRL not reachable
9.65Defect ID: CSCvq05584
Cisco IOS and IOS XE Software Tcl Arbitrary Code Execution Vulnerability

Cisco Integration

Learn more about where this data comes from

Cisco Integration

Learn more

Bug Scrub Advisor

Streamline upgrades with automated vendor bug scrubs

Bug Scrub Advisor

Learn more

BugZero Enterprise

Wish you caught this bug sooner? Get proactive today.

BugZero Enterprise

Learn more

Ready to prevent the next vendor outage?

Get a demo

OPERATIONAL DEFECT DATABASE

Cisco - Defect ID: CSCvj94357

Catalyst 9400 Line card may go to 'Faulty' status after reload.

Cisco - Defect ID: CSCvj94357

Catalyst 9400 Line card may go to 'Faulty' status after reload.

Last updated on March 6th, 2025

BugZero Risk Score
7.9 High

Bug Details

Symptom

Conditions

Workaround

Further Problem Description

Links

Top Cisco Defects by Risk Score

Ready to prevent the next vendor outage?

OPERATIONAL DEFECT DATABASE

Cisco - Defect ID: CSCvj94357

Catalyst 9400 Line card may go to 'Faulty' status after reload.

Cisco - Defect ID: CSCvj94357

Catalyst 9400 Line card may go to 'Faulty' status after reload.

Last updated on March 6th, 2025

BugZero Risk Score7.9 High

Bug Details

Symptom

Conditions

Workaround

Further Problem Description

Links

Top Cisco Defects by Risk Score

Ready to prevent the next vendor outage?

BugZero Risk Score
7.9 High