Loading...
Loading...
Physical link issues that result in rapid losses of signal strength, typically originating from faulty infrastructure (such as a failing optic, bad cable, or faulty patch panel), can result in an unrecoverable switch failure. When this occurs on platforms running Fabric OS (FOS) versions 9.2.0x, 9.2.1x, 9.2.2x or 10.0.0x, the firmware is designed to intentionally fault the application-specific integrated circuit (ASIC).These failing infrastructures cause switch or blade faults after upgrading to FOS versions 9.2.0x, 9.2.1x, or 9.2.2x. This has, in some cases, unnecessarily triggered hardware replacement procedures.NOTE: Replacing the switch or port blade alone willnotresolve the issue. Without remediating the failing infrastructure, new hardware will eventually fail and be faulted.This failing infrastructure, besides causing physical link errors, can also sometimes produce rapid changes in signal strength. This is a new failure behavior being observed. Persistent, repeated failures can eventually result in specific failure conditions aligned with attenuation in signal strength, creating a critical failure of the switch port and rendering it unable to correctly receive frames.SymptomsPrior to the critical switch or blade fault, excessive physical layer errors will be observed.Example (indicative of physical errors on the link for port 60):Switch:admin> porterrshowFrames enc crc crc too too bad enc disc link loss loss frjt fbsy c3timeout pcs uncortx rx in err g_eof shrt long eof out c3 fail sync sig tx rx err err60: 516.9k 2.4m 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5.4k 891Persistent, repeated failures can eventually result in specific failure conditions aligned with attenuation in signal strength can cause a critical failure of the switch port. A critical ASIC fault error code "C4-1056" may also be presented:2025/11/11-00:00:01 (GMT), [C4-1056], 1234567, CHASSIS, CRITICAL, G630, Chip in Slot 0, Chip 0 getting faulted with reason 53.2025/11/11-00:00:01 (GMT), [EM-1134], 1234568, FFDC | CHASSIS, ERROR, G630, Switch set to faulty, rc=20015.NOTE: The "C4-1056" critical ASIC fault error code is generically presented with any Fabric OS detected critical failure. The only clear signature for this failure is the switch or blade fault accompanying excessive physical layer errors.
This issue is observed on the following HPE Storage switches or port blades running any version of Fabric OS (FOS) v9.2.0x, v9.2.1x or v9.2.2x:HPE Storage Fibre Channel Switch B-series SN6600B (switchType 162)HPE Storage Fibre Channel Switch B-series SN6650B (switchType 173)This issue also occurs with the following port blades installed in an SN8600B or SN8700B switch running any patch version of FOS 9.2.0x, 9.2.1x, 9.2.2x, or 10.0.0x (for SN8600B 32Gb SAN extension blade installed in an X7-8/X7-4 only):SN8600B 32Gb 48 Port Blade (Blade ID 178)SN8600B 32Gb 64 Port Blade (Blade ID 204)SN8600B 32Gb SAN Extension Blade (Blade ID 186)
All FOS 9.2.x, 9.2.1x, 9.2.2x and 10.0.0x versions of firmware are designed to detect and react to potential critical failures within the receiving port. The Fabric OS firmware will fault the ASIC as a protective measure, until the source of the infrastructure issues can be remediated. This faulted ASIC will cause the switch itself to be faulted or, in director-class devices, the specific port blade to be faulted. Restoring service requires a manual reboot or power-cycle of the faulted switch or blade.For a switch to be at risk of critical port failure, the underlying faulty infrastructure issue must transmit signals to the switch in a specific manner that results in a sudden, unexpected drop in signal strength. While the ASIC within the switch or port blade is designed to handle corrupted frames and excessive Invalid Transmission Words (ITWs), an unexpected signal drop occurring at a specific point can temporarily halt frame reception and transmission, leading to an ASIC fault.RecoverySwitches affected by this issue will report high counts of ITWs, encoding errors, loss of sync, and other physical link errors before a critical failure occurs.IMPORTANT: Not all instances of failing infrastructure will lead to a critical failure; the critical risk only occurs when the faulty infrastructure transmits the signal in the specific manner described and the link errors are allowed to continue without remediation over a period of time.NOTE: Newer ASICs, including those used in all Gen 7 and Gen 8 products, are not susceptible to this specific failure condition. Furthermore, the switches and port blades that are at risk must also be running a version of Fabric OS configured to fault the ASIC when these specific failing conditions are detected and met.While this is being detected as a critical switch failure, this is not a hardware issue. The switch or blade should not be replaced, and the real cause of the issue is the failing infrastructure (most commonly failing optics, bad cables, or faulty patch panels). Replacing the equipment without first addressing the underlying infrastructure issue is likely to cause the new switch to fail as well.To recover from the current fault:NOTE: The faulty infrastructure issues (most commonly either a failing optic, bad cable, faulty patch panel, etc.) issue should be resolvedbeforethe switch or director blade is recovered.Remediate the infrastructure issue.For faulted switches: The switch must be manually rebooted or power-cycled after the infrastructure issue is resolved.For Director Blades: The director’s blade must be power-cycled using the following procedure: perform aslotpowerofffollowed by aslotpoweronto recover the faulted blade.RecommendationsMonitoring for the infrastructure issues such as failing links, failing optics, and misbehaving devices, remediating them quickly, as well as addressing them proactively can prevent more severe downstream faults.To enhance protection against faults (such as a switch or blade fault) that may require a reboot or power-cycle to recover, Broadcom strongly recommends enabling Monitoring and Alerting Policy Suite (MAPS) alerts and Port Fencing (a Fabric Vision license is required for this feature).The default conservative policy includes pre-defined threshold values designed to trigger alerts only in response to excessive errors. If these values are not suitable for a specific environment, the option to create a custom policy and adjust the thresholds exists and can be utilized.Below are sample commands to add the "FENCE" action globally and enabledflt_conservative_policy:=====Switch:admin> mapsconfig --actions "RASLOG,SNMP,FPIN,FENCE"Switch:admin> mapspolicy --enable dflt_conservative_policy=====NOTE: If the "FENCE" action is not desired, it is essential to ensure alerting is fully set up and that physical errors are addressed promptly to minimize overall risk.Refer to theMonitoring and Alerting Policy Suite User Guidefor detailed instructions on configuring both fencing and alerting actions.
Operating Systems Affected:Not Applicable
Click on a version to see all relevant bugs
Hewlett Packard Enterprise Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.