...
Any of the HPE Servers mentioned in the Hardware Platforms Affected section below, with CC6 power state enabled and configured with AMD EPYC 7XX2 Gen2 series processors, will stop responding and may see a Blue Screen Error on Windows or a Panic Error for Linux if it has been up for approximately 1044 days since the last system reset. The number of days until failure may vary depending on the spread spectrum and reference clock frequency. Note: HPE ROM Workload settings indirectly enable the CC6 state. This behavior is identified with Errata 1474 by AMD here .
Any HPE server configured with AMD EPYC 7XX2 Gen2 processors, if it has been up for approximately 1044 days since the last system reset.
No action is needed for the following workloads that do not support the CC6 state by default, unless customizations have been applied: Virtualization – Max Performance, Low Latency, Transactional Application Processing, and High-Performance Compute (HPC). The system is at risk of experiencing this issue if the CC6 Power State has explicitly been enabled, or if it is running one of the below listed workloads: General Power Efficient Compute, Virtualization – Power Efficient General Peak Frequency Compute, General Throughput Compute, Mission Critical, Decision Support, Graphic Processing, I/O Throughput, and Custom The following options are available to prevent this issue from occurring: Recommended - Schedule a reboot of the system before the projected time of failure. Determine how long a system is up after the last reset. Depending on the OS, there are multiple ways to determine this uptime. Windows: Go to the "Performance" tab in Task Manager. The Uptime will be displayed in the 'CPU' section. Another way is to use the command "systeminfo | find "System Boot Time" in the command line. Linux: use the "uptime" command. VMware: use the ESXi command shell and the "uptime" command. A reboot (Reset) should be scheduled prior to the projected time of failure. Once a reboot (reset) is performed, the counter will be reset. If CC6 is not disabled from the RBSU menu, the above must be repeated before the next projected failure time. Disable the CC6 power state from the BIOS/Platform Configuration (RBSU). If a system has a workload profile of General Power Efficient Compute and Virtualization – Power Efficient, then a custom workload profile must be selected to disable the CC6 state. If a system has a workload profile of General Peak Frequency Compute, General Throughput Compute, Mission Critical, Decision Support, Graphic Processing, I/O Throughput and Custom, ensure that "Minimum Processor Idle Power Core C-State option has “No C-states" selected to disable the CC6 state. This option can be found in the RBSU menu under "Power and Performance Options". Disable the CC6 state during runtime. Note: to achieve this process, third-party utilities available from the web are needed. HPE is not responsible for content outside its domain. Writing 0x80808 to CSTATE_CONFIG (MSR 0xC001_0296) to all cores before the projected failure time can disable the CC6 state during runtime. After any reset event, this process must be repeated to disable the CC6 state once again. Windows: Write 0x80808 to CSTATE_CONFIG (MSR 0xC001_0296) to all cores by using a Read & Write Utility. Linux: Using the cpupower utility with this command "cpupower idle-set –d 2" VMware: Changing ESXi power profile to "High Performance" with this command: "esxcli hardware power policy set -i 1".