...
Document Version Release Date Details 2 12/01/2020 Updated to include HPE Superdome Flex 280 1 09/28/2020 Original document release It is normal for datacenters to perform maintenance on their AC power infrastructure. When this is done, maintenance is performed on one power grid at a time in order to allow system operations to continue. In extremely rare instances, the power supplies in a Superdome Flex complex may not properly recover after power to the AC grid is re-applied. If this occurs, the power supplies will be in a "latched fault" status. Because of this status, power being provided to the chassis will not be fully redundant. If this occurs, when power to the other AC grid is removed, the affected chassis will unexpectedly power off due to insufficient power, resulting in an unplanned system outage.
Any HPE Superdome Flex or HPE Superdome Flex 280 Servers in the scenario described above.
When Datacenters are scheduling work on their AC power infrastructure, it is important to confirm that ALL the Superdome Flex chassis have regained full power redundancy prior to removing AC power from the other grid. Follow the recommended instructions on how to confirm that power is fully redundant on all of the chassis within a Superdome Flex complex prior to continuing maintenance. Note that loss of AC power, to a server's power grid, is not always planned and can be caused by a failing PDU. Always confirm that all the chassis within the Superdome Flex complex have regained full power redundancy once power has been restored. In order to verify this, execute the "show health" command from the RMC/eRMC CLI and confirm that there are no power supplies showing as indicted or in a failed state. If power supplies of a specific AC grid report a fault after power maintenance is performed (i.e. after the input power to that AC grid is restored) and do get indicted, it will be necessary to physically reseat the affected power supplies. This is required in order to remove all STBY power from the affected power supplies. The impacted power supplies must be manually acquitted. Confirm that power to all the chassis in the Complex is now fully redundant. Example log entry: In this example, scheduled maintenance on the power infrastructure was performed at 2020-07-11 06:48. Input power was removed from AC grid 1, which energizes PSUs 0 and 1 on both chassis and the RMC's PSU1. RMC/eRMC> show logs iel 340196 2020-07-11 06:48:22Z PET r001i16c - *CRIT (5)FF00010110FFFF44 RMC_POWER_REDUNDANCY_LOST [rack1/rmc_u16/psu1] 340198 2020-07-11 06:48:23Z PET r001i06b 0 *CRIT (5)FF01000106FFFF44 POWER_SUPPLY_INPUT_LOST[rack1/chassis_u6/psu0] 340199 2020-07-11 06:48:23Z PET r001i06b 0 *CRIT (5)FF01010106FFFF44 POWER_SUPPLY_INPUT_LOST [rack1/chassis_u6/psu1] 340200 2020-07-11 06:48:23Z PET r001i06b 0 *WARN (3)FFFF00010600FF40 REDUNDANCY_LOST [rack1/chassis_u6/power_zone0] 340201 2020-07-11 06:48:23Z MFW r001i16c/CAE - Info (2) FFFFFFFF00000452CAE_EVENT_GENERATED 340202 2020-07-11 06:48:24Z MFW r001i16c/CAE - Info (2) FFFFFFFF00000452 CAE_EVENT_GENERATED 340203 2020-07-11 06:48:24Z MFW r001i16c 0 Info (2) FF01000106FFFF44SENDING_SERVICE_EVENT [rack1/chassis_u6/psu0] 340204 2020-07-11 06:48:25Z MFW r001i16c 0 Info (2) FF01010106FFFF44SENDING_SERVICE_EVENT [rack1/chassis_u6/psu1] 340205 2020-07-11 06:48:25Z PET r001i11b 0 *CRIT (5)FF0100010BFFFF44 POWER_SUPPLY_INPUT_LOST[rack1/chassis_u11/psu0] 340206 2020-07-11 06:48:25Z PET r001i11b 0 *CRIT (5) FF0101010BFFFF44 POWER_SUPPLY_INPUT_LOST[rack1/chassis_u11/psu1] 340207 2020-07-11 06:48:25Z PET r001i11b 0 *WARN (3)FFFF00010B00FF40 REDUNDANCY_LOST[rack1/chassis_u11/power_zone0] 340208 2020-07-11 06:48:25Z MFW r001i16c/CAE - Info (2) FFFFFFFF00000452 CAE_EVENT_GENERATED 340209 2020-07-11 06:48:26Z MFW r001i16c/CAE - Info (2) FFFFFFFF00000452CAE_EVENT_GENERATED 340210 2020-07-11 06:48:26Z MFW r001i16c 0 Info (2) FF0100010BFFFF44 SENDING_SERVICE_EVENT [rack1/chassis_u11/psu0] 340211 2020-07-11 06:48:26Z MFW r001i16c 0 Info (2) FF0101010BFFFF44SENDING_SERVICE_EVENT [rack1/chassis_u11/psu1] After the maintenance was completed, the power was restored on AC grid. Here, there is a rare chance that the power supplies may log a failure which will prevent them from providing the required DC output. 340212 2020-07-11 11:38:02ZPET r001i06b 0 *CRIT (5) FF01000106FFFF44 POWER_SUPPLY_FAILURE_DETECTED [rack1/chassis_u6/psu0] (PSU0_STATUS,0xA0) 340213 2020-07-11 11:38:02ZPET r001i06b 0 Info (2) FF01000106FFFF44 POWER_SUPPLY_INPUT_REGAINED [rack1/chassis_u6/psu0] (PSU0_STATUS,0xA0) 340214 2020-07-11 11:38:02ZPET r001i06b 0 *CRIT (5) FF01010106FFFF44 POWER_SUPPLY_FAILURE_DETECTED [rack1/chassis_u6/psu1] (PSU1_STATUS,0xA1) 340215 2020-07-11 11:38:02ZPET r001i06b 0 Info (2) FF01010106FFFF44 POWER_SUPPLY_INPUT_REGAINED [rack1/chassis_u6/psu1] (PSU1_STATUS,0xA1) 340216 2020-07-11 11:38:02ZMFW r001i16c/CAE - Info (2) FFFFFFFF00000131 CAE_EVENT_GENERATED 340217 2020-07-11 11:38:03ZPET r001i11b 0 *CRIT (5) FF0100010BFFFF44 POWER_SUPPLY_FAILURE_DETECTED [rack1/chassis_u11/psu0] (PSU0_STATUS,0x96) 340218 2020-07-11 11:38:03ZPET r001i11b 0 Info (2) FF0100010BFFFF44 POWER_SUPPLY_INPUT_REGAINED [rack1/chassis_u11/psu0] (PSU0_STATUS,0x96) 340219 2020-07-11 11:38:03ZPET r001i11b 0 *CRIT (5) FF0101010BFFFF44 POWER_SUPPLY_FAILURE_DETECTED [rack1/chassis_u11/psu1] (PSU1_STATUS,0x97) 340220 2020-07-11 11:38:03ZPET r001i11b 0 Info (2) FF0101010BFFFF44 POWER_SUPPLY_INPUT_REGAINED [rack1/chassis_u11/psu1] (PSU1_STATUS,0x97) 340221 2020-07-11 11:38:03ZMFW r001i16c 0 *WARN (3) FF01000106FFFF44 RESOURCE_INDICTED [rack1/chassis_u6/psu0 ] 340222 2020-07-11 11:38:03ZMFW r001i16c/CAE - Info (2) FFFFFFFF00000131 CAE_EVENT_GENERATED 340223 2020-07-11 11:38:04ZMFW r001i16c 0 Info (2) FF01000106FFFF44 SENDING_SERVICE_EVENT [rack1/chassis_u6/psu0] 340224 2020-07-11 11:38:04ZMFW r001i16c 0 *WARN (3) FF01010106FFFF44 RESOURCE_INDICTED [rack1/chassis_u6/psu1] 340225 2020-07-11 11:38:04ZMFW r001i16c/CAE - Info (2) FFFFFFFF00000131 CAE_EVENT_GENERATED 340226 2020-07-11 11:38:05ZMFW r001i16c 0 *WARN (3) FF0100010BFFFF44 RESOURCE_INDICTED [rack1/chassis_u11/psu0] 340227 2020-07-11 11:38:05ZMFW r001i16c/CAE - Info(2) FFFFFFFF00000131 CAE_EVENT_GENERATED 340228 2020-07-11 11:38:05ZMFW r001i16c 0 Info (2) FF01010106FFFF44 SENDING_SERVICE_EVENT [rack1/chassis_u6/psu1] 340229 2020-07-11 11:38:05ZMFW r001i16c 0 *WARN (3) FF0101010BFFFF44 RESOURCE_INDICTED [rack1/chassis_u11/psu1] 340230 2020-07-11 11:38:05ZMFW r001i16c 0 Info (2) FF0100010BFFFF44 SENDING_SERVICE_EVENT [rack1/chassis_u11/psu0] 340231 2020-07-11 11:38:06ZMFW r001i16c 0 Info (2) FF0101010BFFFF44 SENDING_SERVICE_EVENT [rack1/chassis_u11/psu1] It is very important to resolve this condition prior to continuing maintenance on the other AC grid. To recover from this issue, it is necessary to physically "reseat" the impacted power supplies as it is required to remove Standby power from the PSU in order to clear the "latched" fault condition. This advisory will be updated if additional information becomes available. RECEIVE PROACTIVE UPDATES : Receive support alerts (such as Customer Advisories), as well as updates on drivers, software, firmware, and customer replaceable components, proactively in your e-mail through HPE Subscriber's Choice. Sign up for Subscriber's Choice at the following URL: Proactive Updates Subscription Form. NAVIGATION TIP: For hints on navigating HPE.com to locate the latest drivers, patches and other support software downloads to ProLiant servers and options, refer to the Navigation Tips document. SEARCH TIP: For hints on locating similar documents on HPE.com, refer to the Search Tips document.