Extreme-Scale Computational Science Discovery in Fluid Dynamics and Related Disciplines
Compressible flow simulation on Frontier and El Capitan architectures
12:42 pm – 12:54 pmWe present advancements in compressible flow algorithms and their optimization and performance on Oak Ridge Frontier and LLNL El Capitan early access systems. OpenACC is used as the directive-based offloading tool, which we demonstrate to be sufficiently flexible for multi-phase/-physics flow configurations in a large Fortran codebase called MFC [Bryngelson et al., Comp. Phys. Comm. (2021)]. Performance on AMD GPU architectures reveals the meaningful difference between the hardware and compiler profiles of NVIDIA against AMD (A/G)PU hardware. NVIDIA hardware, generally, presents larger cache sizes and higher memory bandwidth, enabling close to compute-roofline performance with NVHPC compilers. AMD hardware is partially limited by its memory bandwidth and restriction to, in our case, the Cray Compiler Environment (CCE). With new CCE releases, compiler bugs for AMD GPU-offloading remain salient but can be resolved via minor workarounds. Results show closer to compute-roofline performance on current NVIDIA hardware, though the new devices from both vendors may level this difference. Performance studies are conducted where possible and approved for release.