GPU-Accelerated Microscale Modeling: FastEddy(TM)

BACKGROUND

Figure 1: These two animations are taken from different time periods within the same simulation where surface skin temperature was prescribed to evolve from a higher temperature (convective cell regime on the top), to lower temperature with weaker thermal forcing (convective roll regime on the bottom).

Figure 2: FastEddy™ limited area domain simulation with the cell perturbation method for resolved turbulence instigation (top) versus a periodic domain reference simulation (bottom) versus. This feature allow FastEddy™ to be applied real-world locations for specific times and dates. The longer-term goal is to provide synchronization of FastEddy™ simulations as nested-domains driven by WRF mesoscale forecast simulations (e.g. High Resolution Rapid Refresh, HRRR forecasts).

 

Figure 3: FastEddy™ has been extended to allow for multiple GPU execution, alleviating the limited memory constraints on domain size for single GPU simulations. Here, the foreground image shows a single domain of size 22.5 km x 54 km consisting of ~10 million gridpoints run on a single GPU. The background image shows the results of utilizing 16 GPUs under horizontal domain decomposition via MPI, to model a domain 16 times larger (90 km x 216 km, 160 million gridpoints) at the same, sub-100m resolution.

The overarching goal of this effort is to design, develop, implement, validate, and promulgate a disruptive capability in the numerical modeling of complex microscale flows utilizing advanced computing architectures. To date, the application of the Large-Eddy Simulation (LES) technique has been restricted to fundamental research due to the substantial computational expense of the method. Nonetheless, the efficacy of this method in capturing the influence of turbulence across a plethora of application scenarios only continues to grow. Our mission is to develop an LES modeling system targeting general purpose graphics processing unit (GPGPU) architectures in order to achieve at least order-of-magnitude performance gains. Such performance gains are the crucial requirement for realization of the LES method as a viable tool for microscale operational, educational, and more comprehensive research applications.

FastEddy™ is a new hybrid CPU/GPU-accelerated, LES model developed within RAL-NSAP beginning in FY2017. Applications of this model target turbulence-resolving microscale atmospheric boundary layer flow simulation with atmospheric transport and dispersion of hazardous species and greenhouse gases.  FastEddy™ is a resident-GPU model, meaning that all prognostic calculations are carried out in an accelerated manner on the GPU with CPU utilization strictly limited to model configuration and input/output of modeling results.  This resident GPU approach shows tremendous early potential for achieving faster-than-real-time microscale simulations across domains of order 100-1000 km2 at a resolution of O(10m).  

FY2017 ACCOMPLISHMENTS

  • MPI-based multi-CPU framework designed, implemented and tested as the fundamental control layer.
  • CPU and GPU implementation of roughly two-thirds of the entire hydrodynamics core.
  • Verification of dynamical core formulation and implementation for thermal bubble, diffusion, and Kelvin-Helmholtz scenarios.
  • Design and implementation of netCDF-4 prognostic variable fields.
  • Demonstrated triple-digit speedup of modern GPU-deployed FastEddy™ over current CPU.
  • Momentum stress, turbulence closure, and surface layer parameterization (Monin-Obhukov) were implemented.
  • Demonstrated ~3x faster than real-time execution in fully-compressible mode for domain extents of 80km2 at resolution of 30m for canonical stability regimes on a single NVIDIA GP100 GPU.
  • Heat Flux and dynamic skin temperature capabilities added for flexible surface forcing conditions prescription.

FY2018 Accomplishments

  • Momentum stress, turbulence closure, and surface layer parameterization (Monin-Obhukov) were implemented.
  • Demonstrated ~3x faster than real-time execution in fully-compressible mode for domain extents of 80km2 at resolution of 30m for canonical stability regimes on a single NVIDIA GP100 GPU.
  • Heat Flux and dynamic skin temperature capabilities added for flexible surface forcing conditions prescription.
  • Resolved turbulence instigation at lateral domain boundaries through the cell perturbation method. This capability permits non-periodic realistic simulations in addition to saving time and computational resources by maximizing the portion of a simulated domain containing properly resolved turbulence.
  • Double-buffered asynchronous data transfers between CPU and GPU, allowing concurrent simulation progress and writing of results to disk.
  • Extensions for multi-GPU execution via MPI+CUDA allowing very-large LES domain simulations.

FY2019 Plans

  • Urban environment effects through two candidate approaches. Buildings will be represented on the resolved grid through both either the immersed boundary method, or a geometry resolved, subgrid-scale porous media-like drag formulation.
  • MPAS or WRF to FastEddy™ coupling for combined mesoscale and microscale modeling in one system utilizing the cell perturbation method for resolved turbulence instigation at the nested boundaries of LES domains.