![]() ![]() This leverages the new NVIDIA cuFFTMp library, a library able to perform the required fast Fourier transforms (FFTs) in a distributed way across multiple GPUs within and across compute nodes.ĬuFFTMp, in turn, uses NVSHMEM, a parallel programming interface enabling fast one-sided communications. This single PME GPU limitation has been lifted in the brand new GROMACS 2023 release version through the introduction of PME GPU decomposition. The PME calculation must be decomposed across multiple GPUs to enable further scaling. When scaling up, the single PME GPU becomes the limiting factor at some point. While it was possible to add more GPUs to tackle the PP force calculations, it wasn’t possible to scale most simulations beyond a few nodes. However, there still existed a scalability limitation associated with the restriction to a single GPU for PME. For more information about how to activate this feature, see the How to run section later in this post. ![]() In the GROMACS 2022 release version, the GPU-direct communication feature was extended to support compatibility with CUDA-aware MPI, to enable GPU-direct communications across multiple nodes (as well as within each node). We described this work in more detail in the paper, Heterogeneous parallelization and acceleration of molecular dynamics simulations in GROMACS. We also described developments to perform communications directly between these GPUs: halo-exchanges between the multiple PP GPUs plus PP-PME communications. We described how GROMACS typically assigns one GPU to PME long-range force calculations (performed through transformations to Fourier space), with the remaining GPUs used for short-range particle-particle (PP) force calculations (performed directly in real space). In a previous post, we presented optimizations to multi-GPU scalability within a single node, including the development of GPU direct communications. Implementation of improved multi-node performance We observe up to 21x performance improvements enabled through this work. In this post, we showcase the latest of these improvements, made possible through the enablement of GPU Particle-mesh Ewald (PME) decomposition with GPU direct communication: a feature available in the new GROMACS 2023 release version. Over the past several years, NVIDIA and the core GROMACS developers have collaborated on a series of multi-GPU and multi-node optimizations. GROMACS can use multiple GPUs in parallel to run each simulation as quickly as possible. ![]() GROMACS, a scientific software package widely used for simulating biomolecular systems, plays a crucial role in comprehending important biological processes important for disease prevention and treatment. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |