...
Under the following conditions, data corruption may occur while carrying out heavy I/O on storage attached to a PERC H330 controller on a 14th generation PowerEdge AMD-based server: H330 is installed with a Linux-based OS with the CPU Virtualization Technology (VT) function enabled in the system BIOSVMware operating systems/ESXi with H330 storage controller configured as a VMDirectPath I/O pass-through device(PCI-Passthrough) to a Linux Virtual Machine (VM) The risk of data corruption is only exposed to the VM that has the H330 connected as a passthrough device. What is affected? All 14G AMD servers (Single or Dual Processor) R6415R7415R7425 Linux-Based Operating Systems including but not limited to Red Hat Enterprise Linux 7.5Red Hat Enterprise Linux 7.6Ubuntu 16.04Ubuntu ® 18.04 LTSCentOS 7.5CentOS 7.6SLES 12 SP3/SP4SLES 15 All current versions of ESXi hypervisor ESXi 6.5.xESXi 6.7.x Storage controller: PERC H330 in RAID or Non-RAID mode Summary: A specific configuration is required to encounter this issue. 14G AMD server + Linux OS + H330 controller 14G AMD server + ESXi + H330 configured as VMDirectPath I/O pass-through to a Linux VM What is not affected? 14G Intel PlatformsAny storage controller (HBA330/H730/H740/H840, and so on) other than H330Windows operating systems
Root cause: Linux AMD_IOMMU driver uses the same memory range BIOS reserved for H330 for both I/O data buffer and I/O virtual address for accessing different physical memory area resulting in file system corruption. Also, IVRS Table in BIOS provides the starting address and length of the exclusion range for H330. While the AMD IOMMU Driver is Setting up the exclusion range, the Driver is adding the IVRS provided starting address and length to get the ending address that it uses to program the exclusion range limit register in the IOMMU, but to get the ending address that it should add the length to the starting address and subtract one, which results in the exclusion range excluding one page extra past the end of the BIOS specified exclusion range. If the Kernel uses this extra page address as IOVA, then it leads to data corruption.VMware/ESXi: Configuring a VM to use H330 controller in a VMDirectPath I/O mode may result in storage and memory corruption for the said VM
Dell engineering is aware of the issue, and a BIOS workaround is made available with BIOS 1.8.7 version or newer. R7425 - https://www.dell.com/support/home/product-support/product/poweredge-r7425/driversR6415 - https://www.dell.com/support/home/product-support/product/poweredge-r6415/driversR7415 - https://www.dell.com/support/home/product-support/product/poweredge-r7415/drivers Dell Technologies recommends that you update the BIOS to 1.8.7 or later.A kernel fix is also in progress by Linux vendors and VMware. Once an updated kernel package is available from the Linux vendors and from VMware, it may provide an alternative solution to this problem. Dell attempts to note information regarding the fixes from Linux vendors and VMware here as they become available. VMware KB link: https://knowledge.broadcom.com/external/article?legacyId=68068Red Hat KB: https://access.redhat.com/solutions/3978031 (requires login)SUSE KB: https://www.suse.com/support/kb/doc/?id=000019431 Back to Top