...
Impact:Switch downSwitch disabled.Power supplies faulty.Core blades faulty.Port blades faultyProblem:Director Switch into disable state with port blade / core blades and several power supplies at faulty stateSwitch outputs:slotshow -m :Slot Blade Type ID Model Name Status -------------------------------------------------- 1 SW BLADE 96 FC16-48 FAULTY (28) 2 SW BLADE 96 FC16-48 FAULTY (28) 3 UNKNOWN VACANT 4 UNKNOWN VACANT 5 CORE BLADE 98 CR16-8 FAULTY (50) 6 CP BLADE 50 CP8 ENABLED 7 CP BLADE 50 CP8 ENABLED 8 CORE BLADE 98 CR16-8 FAULTY (50) 9 UNKNOWN VACANT 10 UNKNOWN VACANT 11 SW BLADE 96 FC16-48 FAULTY (28) 12 SW BLADE 96 FC16-48 FAULTY (28)Errdump:Streaming C2-1011 and C2-1014 messages within the same second. Around 80 or more message seen from the same blade. [C3-1011], 25508, SLOT 6 | CHASSIS, WARNING, switchName, Detected a complete loss of credit on internal back-end VC: Slot 2, Port -1(0) vc_no=0 crd(s)lost=4.[C3-1014], 25509, SLOT 6 | CHASSIS, WARNING, switchName, Link Reset on Port S2,P-1(0) vc_no=0 crd(s)lost=4 auto trigger.[truncated][C3-1011], 25508, SLOT 6 | CHASSIS, WARNING, switchName, Detected a complete loss of credit on internal back-end VC: Slot 2, Port -1(0) vc_no=0 crd(s)lost=4.[C3-1014], 25509, SLOT 6 | CHASSIS, WARNING, switchName, Link Reset on Port S2,P-1(0) vc_no=0 crd(s)lost=4 auto trigger. Followed by a software verify error:[RAS-1004], 25572, SLOT 6 | FFDC | CHASSIS, WARNING, switchName, Software 'verify' error detected. Followed by power supplies failure:[RAS-1001], 25573, SLOT 6 | CHASSIS, INFO, switchName, First failure data capture (FFDC) event occurred.[EM-1034], 25574, SLOT 6 | CHASSIS, ERROR, switchName, PS 1 set to faulty, rc=2000e.[EM-1034], 25575, SLOT 6 | CHASSIS, ERROR, switchName, PS 2 set to faulty, rc=2000e.[EM-1034], 25576, SLOT 6 | CHASSIS, ERROR, switchName, PS 4 set to faulty, rc=2000e. Next the switch went into disable state, due to core blade failures:[PLAT-1072], 25577, SLOT 6 | FFDC | CHASSIS, CRITICAL, switchName, The chassis is disabled because no Core Blades are available. Insert/replace one or both Core Blades and run chassisenable.[RAS-1006], 25578, SLOT 6 | CHASSIS, INFO, switchName, Support data file (switchName-S6cp-201608281317-core_files.tar) automatically transferred to remote address ' x.x.x.x '.[RAS-1001], 25579, SLOT 6 | CHASSIS, INFO, switchName, First failure data capture (FFDC) event occurred. Followed by port blade and core blade faulty messages.[EM-1134], 25580, SLOT 6 | FFDC | CHASSIS, ERROR, switchName, Slot 1 set to faulty, rc=2001c.[EM-1134], 25581, SLOT 6 | FFDC | CHASSIS, ERROR, switchName, Slot 2 set to faulty, rc=2001c.[EM-1134], 25582, SLOT 6 | FFDC | CHASSIS, ERROR, switchName, Slot 11 set to faulty, rc=2001c.[EM-1134], 25583, SLOT 6 | FFDC | CHASSIS, ERROR, switchName, Slot 12 set to faulty, rc=2001c.[EM-1134], 25584, SLOT 6 | FFDC | CHASSIS, ERROR, switchName, Slot 5 set to faulty, rc=20032.[EM-1134], 25585, SLOT 6 | FFDC | CHASSIS, ERROR, switchName, Slot 8 set to faulty, rc=20032. Followed by all the informational messages of the hardware reported faulty:[FW-1010], 25588, SLOT 6 | FID 1, WARNING, switchName, Env Power Supply 1, is below low boundary(High=0, Low=1). Current value is 0 (1 OK/0 FAULTY).[FW-1010], 25589, SLOT 6 | FID 1, WARNING, switchName, Env Power Supply 2, is below low boundary(High=0, Low=1). Current value is 0 (1 OK/0 FAULTY).[FW-1010], 25590, SLOT 6 | FID 1, WARNING, switchName, Env Power Supply 4, is below low boundary(High=0, Low=1). Current value is 0 (1 OK/0 FAULTY). [FW-1424], 25591, SLOT 6 | FID 1, WARNING, switchName, Switch status changed from HEALTHY to DOWN.[FW-1439], 25592, SLOT 6 | FID 1, WARNING, switchName, Switch status change contributing factor Switch offline. [FW-1427], 25593, SLOT 6 | FID 1, WARNING, switchName, Switch status change contributing factor Power supply: 3 bad. [FW-1010], 25594, SLOT 6 | FID 128, WARNING, switchName, Env Power Supply 1, is below low boundary(High=0, Low=1). Current value is 0 (1 OK/0 FAULTY).[FW-1010], 25595, SLOT 6 | FID 128, WARNING, switchName, Env Power Supply 2, is below low boundary(High=0, Low=1). Current value is 0 (1 OK/0 FAULTY).[FW-1010], 25596, SLOT 6 | FID 128, WARNING, switchName, Env Power Supply 4, is below low boundary(High=0, Low=1). Current value is 0 (1 OK/0 FAULTY). [FW-1424], 25597, SLOT 6 | FID 128, WARNING, switchName, Switch status changed from HEALTHY to DOWN.[FW-1439], 25598, SLOT 6 | FID 128, WARNING, switchName, Switch status change contributing factor Switch offline.[FW-1427], 25599, SLOT 6 | FID 128, WARNING, switchName, Switch status change contributing factor Power supply: 3 bad. [RAS-1001], 25601, SLOT 6 | CHASSIS, INFO, switchName, First failure data capture (FFDC) event occurred.[EM-1010], 25602, SLOT 6 | FFDC | CHASSIS, CRITICAL, switchName, Received unexpected power down for Slot 1 But Slot 1 still has power.[EM-1069], 25603, SLOT 6 | CHASSIS, INFO, switchName, Slot 1 is being powered off.[EM-1010], 25604, SLOT 6 | FFDC | CHASSIS, CRITICAL, switchName, Received unexpected power down for Slot 2 But Slot 2 still has power.[truncated] same messages for the other power supplies.[EM-1009], 25612, SLOT 6 | FFDC | CHASSIS, CRITICAL, switchName, Slot 5 powered down unexpectedly.[EM-1069], 25613, SLOT 6 | CHASSIS, INFO, switchName, Slot 5 is being powered off.[EM-1009], 25614, SLOT 6 | FFDC | CHASSIS, CRITICAL, switchName, Slot 8 powered down unexpectedly.[EM-1069], 25615, SLOT 6 | CHASSIS, INFO, switchName, Slot 8 is being powered off.
Faulty hardware causing the I2C bus on the backplane to be over utilized, causing a chain reaction in the director, safeguarding itself and powering off other hardware components. - The first component to fail was port blade in slot 2 (can be any).- Streaming messages of faulty ASIC s (in the raslog) and component failures of the blade, causing the I2C bus to be over utilized.- The I2C bus in turn was not able to pass other messages to the operating system, including the check messages for the other hardware components.- Resulting in 3 power supplies out of the 4, going into faulty state, and because the power supplies were set to faulty and only 1 power supply left, the switch started to power off the other blades.- When the core blades were powered off the switch automatically went into disable state, because a director switch is not able to function properly without 2 core blades.
Suggested action plan in case the blade in slot 2 was faulty. (see the slotshow under the Issue statement above.) Goal is to bring the switch up in stable state, and then replace the faulty hardware.Plan of action:- Un-seat (not remove) Blade in slot 2, and do not seat back.- Un-seat (not remove) Blade in slot 1, 5, 8 11 and 12.- RE-seat power supplies 1, 2 and 4. Or switch off and on the rocker switch of the 3 power supplies on the back.- Power needs to be restored first to the power supplies before be able to continue with the next steps.- If power restored, (able to check with psshow command, and check if the power supplies are ok.) seat back core blade in slot 5 and 8. (able to check with the errdump command and the slotshow command.)- Once the 2 core blades are up the switch should be ok, to be enabled. This can be done with the switchenable command.- Once the switch is enabled, (check with switchshow and errdump commands) start seating the port blades back in one by one. (NOT port blade in slot 2) Start with port blade 1.- WAIT 5 to 10 minutes until the blade is up and all the connected ports had the time to log in and come online.- Next seat port blade in slot 11. Again WAIT 5-10 minutes, same activity as with port blade 1.- Next seat port blade in slot 12.- Check after each insertion, with the errdump command, if POST succeeds. And the switch show command if the ports coming online. Once all blades (except blade in slot 2) are online, the faulty blade can be replaced.
Click on a version to see all relevant bugs
Dell Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.