Loading...
Loading...
HPE ProLiant Gen10 Plus or Gen10 Plus V2 servers and Apollo Gen10 Plus servers may experience uncorrectable PCIe bus errors. These servers will be configured with AMD EPYC 7xx2- or 7xx3- series processors, where "xx" can be any characters that match an AMD processor model number.The failure message displayed in the Integrated Management Log (IML) may resemble the following examples:Uncorrectable PCI Express Error Detected. Slot 3 (Segment 0x0, Bus 0x43, Device 0x0, Function 0x0). Uncorrectable Error Status: 0x40000 ACTION: Update the firmware of the failing device. If the issue persists, replace the device.Uncorrectable PCI Express Error Detected. Slot 3 (Segment 0x0, Bus 0x43, Device 0x0, Function 0x0). Uncorrectable Error Status: 0x44000 ACTION: Update the firmware of the failing device. If the issue persists, replace the device.Uncorrectable PCI Express Error Detected. Slot 7 (Segment 0x0, Bus 0xCB, Device 0x0, Function 0x0). Uncorrectable Error Status: 0x4000 ACTION: Update the firmware of the failing device. If the issue persists, replace the device.The IML entries above are indicating a "completion timeout" error signaled by an endpoint PCIe option. This will usually be a device capable of high-bandwidth data transfers such as an Infiniband option card or a GPU. Mellanox Network and Infiniband adapters with older firmware may signal only an uncorrectable error status of 0x40000 which indicates a malformed TLP error due to a bug that is fixed with an update that can be downloadedhere. Updated Mellanox adapters will signal an uncorrectable error status of 0x44000.
Affected server platforms are listed in the Products section.
If the server is configured with an AMD EPYC 7xx3 processor, first update the System ROM to version 3.00 (or later). After updating the ROM, reboot the server and press F9 during Power On Self Test (POST) to enter the System Utilities menu. Navigate toSystem Configuration > BIOS/Platform Configuration (RBSU). From the RBSU menu, press CTRL+A to enter the Services menu.Select L1 IO Drop Chain Enable > Enabled. Press F10 to save, then ESC to navigate back to the RBSU menu. Follow the instructions below to further optimize server settings.Servers configured with AMD EPYC 7xx2 or 7xx3 processors may have sub-optimal configuration settings that are also contributing to the failure. HPE has consulted with AMD to provide recommended settings for configuration options in System Utilities. Modify the configuration settings below as indicated. Not all settings may be available for all servers. If a setting is unavailable, it can be ignored.First, if necessary, reboot the server and press F9 during POST to boot to the System Utilities menu. At the System Utilities menu, navigate toSystem Configuration > BIOS/Platform Configuration (RBSU). Navigation to the various settings will begin here.Set the Workload Profile to "Custom". From the "BIOS/Platform Configuration (RBSU)" menu, selectWorkload Profile > Custom. Note that making this selection is necessary to make sure the configuration settings that follow are available. Press F10 to save the setting.Disable Infinity State Power Management. From the "BIOS/Platform Configuration (RBSU)" menu, navigate toPower and Performance Options > Advanced Power Options> Infinity Fabric Power Management > Disable. Press F10 to save the setting.Set the Infinity Fabric Performance State. From the "BIOS/Platform Configuration (RBSU)" menu, navigate toPower and Performance Options > Infinity Fabric Performance State > P0. Press F10 to save the setting.Configure AMD NBIO LCLK DPM Level. From the "BIOS/Platform Configuration (RBSU)" menu, navigate toPower and Performance Options > I/O Options > NBIO LCLK DPM Level. There will be seven different NBIO LCLK options to configure. For each one, selectStatic High. Press F10 to save the setting.Disable C-State Efficiency Mode. From the "BIOS/Platform Configuration (RBSU)" menu, navigate toPower and Performance Options > C-State Efficiency Mode > Disable. Press F10 to save the setting.Disable Data Fabric C-States. From the "BIOS/Platform Configuration (RBSU)" menu, navigate toPower and Performance Options > Data Fabric C-State Enable > Disable. Press F10 to save the setting.Disable Access Control Service. From the "BIOS/Platform Configuration (RBSU)" menu, navigate toVirtualization Options > Access Control Service > Disable. Press F10 to save the setting.Disable Active State Power Management. From the "BIOS/Platform Configuration (RBSU)" menu, navigate to System Configuration > BIOS/Platform Configuration (RBSU) >PCIe Device Configuration > PCIe Power Management (ASPM) > Disabled. Press F10 to save the setting.Set the minimum C-state. From the "BIOS/Platform Configuration (RBSU)" menu, navigate toPower and Performance Options > Minimum Processor Idle Power Core C-State.If the "cpupower" package is installed in the operating system, selectC6.Otherwise, selectNo C-States.Press F10 to save the setting.In addition, at the OS level, configure the OS to execute the following commands on boot.Configure cpupower using the command below.cpupower idle-set -d 2Disable Access Control Services (ACS) on all PCIe devices. An example command is provided below that can be executed on Linux platforms. Executing the command may result in output indicating it cannot be executed for some PCIe devices. This is expected behavior.for i in $(lspci | cut -f 1 -d " "); do setpci -v -s $i ecap_acs+6.w=0; doneNote: these commands are not permanent and need to be entered into a startup script, so they are executed again after a reboot.Revision HistoryDocument VersionRelease DateDetails3June 26, 2025Updated the Resolution section.2October 22, 2024Updated the Resolution to correct the cpupower command and added a note.1May 21, 2024Original Document Release.
Operating Systems Affected:Not Applicable
Click on a version to see all relevant bugs
Hewlett Packard Enterprise Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.