
OPERATIONAL DEFECT DATABASE
...

...
[clone of RHELPLAN-152599] Description of problem: Customer reported performance regression from RHEL 7 to RHEL 8 in Intel Skylake. Version-Release number of selected component (if applicable): How reproducible: The customer used the following example to demonstrate the problem. perf bench mem memcpy -f default --nr_loops 500 --size 3MB That test achieved 8.5 GB/sec on RHEL-7.5, and only 5.3 GB/sec on RHEL-8.4. This is easily reproducible. Steps to Reproduce: Run the above test on RHEL-7.5 and again on RHEL-8.4. The customer had a 2-socket Skylake server. I have been able to reproduce this on a 2-socket Cascade Lake server. Additional info: Thanks to great triaging help from Carlos O'Donell, the problem is understood. It turns out glibc is selecting a sub-optimal memcpy routine for that processor. On RHEL-7.5, it used the "__memcpy_ssse3_back()" routine, which was the optimal choice then. On RHEL-8.4, the glibc memcpy routine used is "__memmove_avx_unaligned_erms()". On RHEL-8.4, if the "Prefer_ERMS" attribute is given to glibc, then the faster "__memmove_erms()" is used. For example, slow and fast cases: perf bench mem memcpy -f default --nr_loops 500 --size 3MB |grep GB 5.468937 GB/sec GLIBC_TUNABLES=glibc.cpu.hwcaps=Prefer_ERMS \ > perf bench mem memcpy -f default --nr_loops 500 --size 3MB |grep GB 12.508272 GB/sec I've also attached a simple memcpy reproducer to demonstrate the problem, as shown below: gcc -O memcpy.c -o memcpy ./memcpy --help USAGE: ./memcpy size-in-MB loop-iterations ./memcpy 3 500 Rate for 500 3MB memcpy iterations: 7.30 GB/sec GLIBC_TUNABLES=glibc.cpu.hwcaps=Prefer_ERMS ./memcpy 3 500 Rate for 500 3MB memcpy iterations: 27.29 GB/sec The customer's system did boot with mitigations=off, and with transparent_hugepages (THP) disabled. Neither are needed to reproduce this problem, but disabling THP does enable the simple memcpy reproducer to achieve much higher performance.
Done-Errata
Red Hat Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.