Evaluating many-core and accelerator-based architectures

Microphysics optimization
Execution time for the Morrison Gettelman (MG) microphysics kernel, which is part of the Community Atmosphere Model (CAM) using the Intel compiler on three generations of Intel Xeon processors (SNB2 = 2-socket Sandy Bridge node; IVB2 = 2-socket Ivy Bridge node; HSW4 = 4-socket Haswell node). Note that optimization consistently reduced execution time by more than a factor of two. Reducing the cost of the MG microphysics calculations, which consume a total of 3% of all time used on Yellowstone, will increase the amount of science that can be performed on our existing and future supercomputers.

In FY2015, CISL’s Application Scalability and Performance (ASAP) group in the Technology Development Division (TDD) has been involved in several collaborations that focus on preparing NCAR applications for future generations of microprocessor architectures. These collaborations include:

  • An Intel Parallel Computing Center (IPCC) focused on Weather and Climate Simulation (IPCC-WACS) funded by Intel in collaboration with the University of Colorado at Boulder (CU Boulder).

  • A National Energy Research Scientific Computing Center (NERSC) Exascale Science Application Program (NESAP) in collaboration with NERSC and Cray Inc, the Indian Institute of Science in Bangalore, India, and the University of Wyoming.

These collaborations have enabled advances in application performance and opportunities to help train the next generation of scientists and engineers who will apply these new technologies to challenges of societal importance.

This effort has focused on weather and climate applications, including the Community Earth System Model (CESM), the Weather Research and Forecasting model (WRF), and the Model for Prediction Across Scales (MPAS), three of the most widely used applications in the field. All three are large Fortran-based simulation codes – for instance, CESM is estimated to have about 1.5 million lines of code.

We have developed tools and techniques to streamline the refactoring effort to allow code optimization to keep up with the science-driver model development. In particular, we developed the Fortran kernel generator KGEN that automates the creation of small computational kernels. We have used KGEN to extract in excess of 30 kernels from both multiple CESM component models and MPAS. These kernels have been used to greatly simplify the testing of optimization ideas, as a collaborative vehicle with compiler and microprocessor vendors, and as a benchmark in the NWSC-2 procurement.

CISL has made significant progress optimizing particular sections of CESM and MPAS. The cost of the Morrison Gettelman (MG) microphysics calculations – which represents 10% of The Community Atmosphere Model’s total runtime – was cut in half. The reduction in cost for MG microphysics calculations for three generations of Intel Xeon microprocessor are provided in the figure above. Similar reductions in execution time were observed for CAM’s random number generator and shortwave radiation module. A dynamical core used within CAM, the High Order Method Modeling Environment (HOMME), received additional optimizations that reduced the total cost of HOMME by 20-60% depending on the scientific configuration.

The IPCC-WACS project is funded by a donation from Intel Corporation. Additional optimization efforts within ASAP are supported by NSF Core funds.