
OPERATIONAL DEFECT DATABASE
...

...
Document VersionRelease DateDetails4April 12, 2023Updated the Resolution section with the new CPLD version 0x14, and clarified the process to upgrade the components.3November 15, 2022Updated the steps in the Resolution to update the CPLD and FPGA firmware.2October 25, 2022Updated the Resolution steps.1September 26, 2022Original Document Release.IMPORTANT:The firmware updates mentioned in the Resolution section are required to prevent the issue detailed below. By disregarding this notification and not performing the recommended resolution, the servers may become unable to power on.After a cold or aux power-cycle, an HPE ProLiant XL675d server configured with modular NVIDIA HGX GPUs (40GB and/or 80GB as shown in the Scope section below), may remain in a state where the system is unable to power on.When this occurs, the four system LEDs on the front of the server will continually repeat a pattern of 13 blinks, followed by a short pause.In addition to not powering on, and the LED blink pattern described above, the following message will appear in the Integrated Lights-Out (iLO) Integrated Management Log (IML):Server Critical Fault (Service Information: Power-On Fault, GPU Board/Modules, GPU Board VRD (02h))
Any HPE ProLiant XL675d platform with either of the following NVIDIA modular A100 GPUs:NVIDIA HGX A100 40GB 8GPU AIR FIO BS BRD (R3V64A) - Spares Kit SPS-PCA,A100 SXM4 x8 40GB Air-cooled w/HS - P22765-001NVIDIA HGX A100 80GB 8GPU AIR FIO BS BRD (R7B94A)-Spares Kit SPS-PCA,A100 HGX x8 80GB Air-Cooled w/HS - P38948-001Note: This does not affect ProLiant XL675d systems with traditional PCIe form-factor GPUs.
The issue has been fixed in NVIDIA GPU baseboard FPGA firmware version v3.14, and that new FPGA firmware comes with a mandatory dependency on server CPLD to 0x14.On ProLiant XL675d systems, with NVIDIA modular HGX A100 40GB/80GB GPUs, update the recommended firmware as soon as possible.It is critically important to follow the sequence detailed below, this is necessary because of the inter-dependency between the images and the fact that auxiliary power is automatically cycled during the sequence.Download these firmware images:Latest iLO firmware.FPGA firmware versionv3.14.CPLD firmware version0x14.Leave the server booted and at the OS prompt. Donotshutdown the server, because this presents risk of hitting the issue before we can mitigate with the new firmware.With the server booted throughout, update these firmware images from the "Update FW" tab of iLO web interface, in this exact sequence:Update the iLO firmware. The iLO will reset afterwards.Update the baseboard FPGA by following the instructions in the image,but do not shutdown or reboot yet.Update the server CPLD by following the instructions in the image,but do not shutdown or reboot yet.With that set of firmware now flashed, proceed with a graceful shutdown of the server, so that the new firmware can become effective.When the server reaches standby power, the server will automatically cycle auxiliary power, the iLO connection will disconnect.After approx. 30 seconds, auxiliary power will automatically return, the iLO connection will be restored, and the server will boot normally.Verify that server CPLD 0x14 and NVIDIA FPGA v3.14 are in place by selecting the "Firmware & OS Software" panel, then "Firmware" tab in iLO.If the issue described in the Description has already occurred, contact your localcountry HPE Customer Supportor log a case via theHPE Support Center, and reference document: a00127196en_us.RECEIVE PROACTIVE UPDATES: Receive support alerts (such as Customer Advisories), as well as updates on drivers, software, firmware, and customer replaceable components, proactively in your e-mail through HPE Support Alerts. Sign up for Support Alerts at the following URL:HPE Email Preference Center.NAVIGATION TIP:For hints on navigating HPE.com to locate the latest drivers, patches and other support software downloads, refer to theNavigation Tips document.SEARCH TIP:For hints on locating similar documents on HPE.com, refer to theSearch Tips document.
Operating Systems Affected:Not Applicable
Click on a version to see all relevant bugs
Hewlett Packard Enterprise Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.