...
Savitha Pareek, HPC and AI Innovation Lab, November 2019AMD recently announced its 2nd generation EPYC processors (codenamed "ROME") which support up to 64 cores, and DellEMC has just released High Performance Computing (HPC) servers designed from the ground up to take full advantage of these new processors. We have been evaluating applications on these servers in our HPC and AI innovation Labs, including the Molecular Dynamics Application – GROningen MAchine for Chemical Simulations (GROMACS) application and report our findings for GROMACS in this blog.
GROMACS is a free and open-source parallel molecular dynamics package designed for simulations of biochemical molecules such as proteins, lipids, and nucleic acids. It is used by a wide variety of researchers, particularly for biomolecular and chemistry simulations. It supports all the usual algorithms expected from modern molecular dynamics implementation. It is open-source software with the latest versions available under the GNU Lesser General Public License (LGPL). The code is mainly written in C and makes use of both MPI and OpenMP parallelism. This blog describes the performance of GROMACS on two-socket PowerEdge servers using the latest addition to the AMD® EPYC Rome processors listed in Table 1(a). For this study, we carried out all benchmarks on a single server equipped with two processors, running only a single job at a time on the server. We compared performance improvements on the 2nd generation AMD EPYC Rome (7xx2 series) based PowerEdge servers with the previous generation DellEMC PowerEdge servers equipped with the 1st generation AMD EPYC Naples (7xx1 series) processors listed in table 1(b). Table 1(a)-ROME CPU models evaluated for single node study CPU Cores/Socket Config Base frequency TDP 7742 64c 4c per CCX 2.25 GHz 225W 7702 64c 4c per CCX 2.0 GHz 200W 7502 32c 4c per CCX 2.5 GHz 180W 7452 32c 4c per CCX 2.35 GHz 155W 7402 24c 3c per CCX 2.8 GHz 180W Table 1(b)- Naples CPU model evaluated for comparison CPU Cores/Socket Config Base Clock TDP 7601 32c 4c per CCX 2.2 GHz 180W Server configurations are included in Table 2(a), with the list of the benchmark data sets given in Table 2(b). Table 2(a)-Testbed Component ROME Platform NAPLES Platform Processor As shown in Table.1a As shown in Table.1b Memory 256 GB, 16x16GB 3200 MT/s DDR4 256 GB, 16x16GB 2400 MT/s DDR4 Operating System Red Hat Enterprise Linux 7.6 Red Hat Enterprise Linux 7.5 Kernel 3.10.0.957.27.2.e17.x86_64 3.10.0-862.el7.x86_64 Application GROMACS – 2019.2 Table 2(b)- Benchmark datasets used for GROMACS performance evaluation on ROME Dataset Details Water Molecule 1536K and 3072K HecBioSim 1400K and 3000K Prace – Lignocellulose 3M For this single node study, we compiled GROMACS version 2019.3, with the latest OPENMPI and FFTW, testing several different compilers, associated high-level compiler options and electrostatic field load balancing (i.e. PME, etc). We carried out two studies for our blog: our first study focused on the performance of the Rome based systems with hyperthreading enabled vs hyperthreading disabled; and our second study investigated the performance advantage obtained with Rome over the Naples system. For our Hyperthreading study, our Hyperthreading results were obtained by enabling Hyperthreading through the BIOS and adjusting the benchmarking parameters to run each benchmark with twice as many threads as the non-Hyperthreaded counterpart. As an example, for the 24-core based 7402 benchmarks, the non-Hyperthreaded single node used 48 threads (dual-processor server) and the Hyperthreaded results used 96 threads. Our results are presented in Figure 1. Figure 1. GROMACS performance evaluation with hyper-threading disabled vs hyper-threading enabled on ROME For these benchmarks, the electrostatic field used was Particle Mesh Ewald (PME) for Water-1536K, Water-3072K, and the HECBIOSIM datasets (1.4M and 3M). We used the reaction field (RF) electrostatic force for the Lignocellulose_3M case. While the performance gains observed (higher is better) with enabling Hyperthreading were varied both with respect to the different processors and data sets, they were consistently better than the non-Hyperthreaded baselines (1.0). GROMACS shows a clear performance boost with hyperthreading enabled across the ROME SKUs. In the second study, we have compared the Rome based servers to the Naples based server, using Hyperthreading enabled for all tests based on the results from the first study. We have measured the relative performance w.r.t to Naples 7601 as baseline (1.0) with the other ROME SKUs. These results are shown in Figure 2. Figure 2. Performance evaluation across different AMD EPYC Generation Processors Comparing the 32-core based servers (7551,7601,7452,7502), we observed a generational performance improvement of about 50%. The 24-core Rome based 7402, while lacking as many cores as the Naples systems, still managed to outperform the Naples based systems by about 20-40%, depending on the respective benchmark. The 64-core based (7702,7742) systems displayed close to a 250% increase in overall performance over the 32-core based Naples server. Overall, the Rome results, particularly with Hyperthreading enabled demonstrated a substantial performance improvement for GROMACS over Naples. Conclusion Dell EMC PowerEdge servers equipped with the AMD ROME processors offer significant single node performance gains over previous generation Naples counterparts for applications such as GROMACS. We found a strong positive correlation with overall system performance and processor core count and a weak correlation with processor frequency. The 64-core Rome processors delivered a sizable performance advantage over the 24-core and 32-core processors. We are in the processing of exploring how these single node performance gains (with and without Hyperthreading) will translate into multi-node performance gains for Molecular Dynamic applications on our new Minerva Cluster at the HPC and AI Innovation Lab. Watch this blog site for updates.