GPU-Accelerated Microscale Modeling: FastEddy

Background

The overarching goal of this effort is to design, develop, implement, validate, and promulgate a disruptive capability in the numerical modeling of complex microscale flows utilizing advanced computing architectures. To date, the application of the Large-Eddy Simulation (LES) technique has been restricted to fundamental research due to the substantial computational expense of the method. Nonetheless, the efficacy of this method in capturing the influence of turbulence across a plethora of application scenarios only continues to grow. Our mission is to develop an LES modeling system targeting general purpose graphics processing unit (GPGPU) architectures in order to achieve at least order-of-magnitude performance gains. Such performance gains are the crucial requirement for realization of the LES method as a viable tool for microscale operational, educational, and more comprehensive research applications.

FastEddy is a new hybrid CPU/GPU-accelerated, LES model developed within RAL-NSAP beginning in FY2017. Applications of this model target turbulence-resolving microscale atmospheric boundary layer flow simulation with atmospheric transport and dispersion of hazardous species and greenhouse gases.  FastEddy is a resident-GPU model, implying that all prognostic calculations are carried out in an accelerated manner on the GPU with CPU utilization strictly limited to model configuration and input/output of modeling results.  This resident GPU approach shows tremendous early potential for achieving faster-than-real-time microscale simulations across domains of order 10 km2 at a resolution of ~ 10 meters.  

FY2017 Accomplishments

  • MPI-based multi-CPU framework designed, implemented and tested as the fundamental control layer.
  • CPU and GPU implementation of roughly two-thirds of the entire hydrodynamics core.
  • Verification of dynamical core formulation and implementation for thermal bubble, diffusion, and Kelvin-Helmholtz scenarios.
  • Design and implementation of netCDF-4 prognostic variable fields.
  • Demonstrated triple-digit speedup of modern GPU-deployed FastEddy over current CPU.

 

Figure 1: GPU versus CPU versus GPU performance trends (source Nvidia). Exponential growth has been shown in both floating-point operations per second (FLOPS) and peak memory bandwidth on GPU architectures versus the near linear growth of CPU since 2000.
Figure 1: GPU versus CPU versus GPU performance trends (source Nvidia). Exponential growth has been shown in both floating-point operations per second (FLOPS) and peak memory bandwidth on GPU architectures versus the near linear growth of CPU since 2000.

FY2018 Plans

  • The remaining third of dynamical core formulation, specifically momentum stress, turbulence closure, and surface layer parameterization (Monin-Obhukov) is expected to be implemented and verified by March 2018.

  • Incorporation of dynamic mesoscale-supplied (from WRF) boundary conditions and turbulence instigation through the cell perturbation. This will allow nested accelerated-LES simulations driven by WRF forecast simulations.

 

FastEddy Kernel Performance on CPU versus GPU


Table 1: Wall clock timing (in seconds) and speed-up  for physics kernels of FastEddy on an Intel E5-2670 (Sandybridge) CPU, versus a 2012 model Nvidia Tesla K20x GPU, versus a 2017 model Nvidia Pascal GP100 GPU. FastEddy hydrodynamics core kernels for advection, diffusion, pressure gradient force, buoyancy, and Coriolis terms of the governing equations achieve 20-50 times faster execution on the 5-year old K20x model GPU versus CPU, and triple-digit speedup of 200-500 times on the most recently released GPU on the market. 

Test

E5-2670

CPU-time

K20x

GPU-time

K20x/CPU

Speedup

GP100

GPU-time

GP100/CPU

Speedup

GP100/K20x

Speedup

ADV

2.037

0.064

31.83

0.0102

199.71

6.27

ADV+DIFF

4.7246

0.087

54.31

0.0097

487.07

8.97

ADV+PGF+BUOY

2.1

0.074

28.38

0.0099

212.12

7.47

ALL

5.28

0.099

53.33

0.0181

291.71

5.47