Loading...
Loading...
NVIDIA GPU temperature readings are not being reported correctly by Redfish and IPMI tools on an HPE ProLiant Compute XD685 system. When viewing the GPU temperature readings from Redfish commands or IPMI, the correct values will not be displayed.The following is an example of the improper value GPU temperature readings from the DCGM tool (Item: thermal test):The following is an example of the GPU temperature readings from Redfish or BMC:
Any HPE ProLiant Compute XD685 system running Ubuntu 24.04, Ubuntu 22.04, or Red Hat 9.4.
A future version of software will allow Redfish to correctly share temperature information from the GPUs. This advisory will be updated when additional information becomes available.As a workaround, use the DCGM open-source generic software/diagnostic tool provided by NVIDIA to check the temperature readings.Nvidia DCGMis a set of tools for managing and monitoring Nvidia GPUs in large-scale, Linux-based cluster environments. It is a low overhead tool that can perform a variety of functions including active health monitoring, diagnostics, system validation, policies, power and clock management, group configuration, and accounting. For more information, see theDCGM User Guidefor more information.Note: One or more links above will take you outside of the HPE website. HPE is not responsible for content outside of its domain.
Operating Systems Affected:Ubuntu 22.04 LTS, Ubuntu 24.04 LTS, Rocky Linux 9
Click on a version to see all relevant bugs
Hewlett Packard Enterprise Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.