Explore many-core and accelerator-based architectures

HOMME optimizations
This chart shows the impact optimizations performed by CISL on the High Order Method Modeling Environment (HOMME) have on both Cheyenne which uses a 2-socket Broadwell node, and Cori which uses a single one-socket Knights Landing node. The computational cost of HOMME in a production-like configuration, which is the dynamical core for the Community Atmosphere Model (CAM), is normalized to the unoptimized code on both Cheyenne and Cori. Interestingly, the optimizations consistently reduce the cost of HOMME on Cori by approximately 50% regardless of core count. On Cheyenne the impact of the optimizations range from approximately 20% to as much as 50% at larger core counts. Reducing the cost of the HOMME dynamical core will increase the amount of science that can be performed on our existing and future supercomputers.

CISL collaborates with NCAR’s science laboratories to provide new tools for exploiting many-core architectures such as general-purpose graphics processing units (GPGPUs). This allows us to increase model performance on advanced many-core architectures, such as those in NCAR’s next supercomputer. CISL also plans to give our users access to advanced systems by acquiring a many-core cluster.

Using codes developed under this initiative, our scientific users gain experience with production systems composed of these emerging technologies. These collaborations have enabled advances in application performance and opportunities to help train the next generation of scientists and engineers who will apply these new technologies to challenges of societal importance.

In FY2017, CISL’s Application Scalability and Performance (ASAP) group and Special Technical Projects (STP) group have participated in several collaborations that focus on preparing NCAR applications for future generations of microprocessor architectures. These collaborations include: An Intel Parallel Computing Center (IPCC) focused on Weather and Climate Simulation (IPCC-WACS) funded by Intel in collaboration with the University of Colorado at Boulder (CU Boulder); A National Energy Research Scientific Computing Center (NERSC) Exascale Science Application Program (NESAP) in collaboration with NERSC and Cray Inc; Weather and Climate Alliance (WACA) funded by NVIDIA in collaboration with NVIDIA PGI and University of Wyoming; the Indian Institute of Science in Bangalore, India.

These efforts focused on weather and climate applications, including the Community Earth System Model (CESM), the Weather Research and Forecasting model (WRF), and the Model for Prediction Across Scales (MPAS): three of the most widely used applications in the field. All three are large Fortran-based simulation codes – for instance, CESM is estimated to have about 1.5 million lines of code.

CISL has made significant progress optimizing several sections of CESM that reduced their computational costs and reintegrating these changes back into the respective code bases. The total execution time of multiple physics modules including those within CAM was shortened, and this reduced the total cost of CAM by more than 15%. Moreover, the HOMME dynamical core used within CAM received additional optimizations that reduced the total cost of HOMME on both Xeon and Xeon Phi platforms from 23 to 75%, depending on the scientific configuration.

Furthermore, as a part of the WACA effort, the MPAS dynamical core has been ported on NVIDIA GPUs via OpenACC while continuing to optimize and retain support for Intel Xeon Phis and traditional multi-core Intel CPUs via a single-source version. This represents refactoring about 35,000 lines of code to efficiently use the SIMD/vector data parallelism of the underlying model. The ported MPAS has achieved a speedup of 2.7 on a Pascal P100 GPU card and 1.9 on a Intel Xeon Phi KNL socket when compared to the performance on a 36-core Broadwell node for the standard 60-km-resolution model.

Work will continue with the science and model development teams at NCAR to both optimize existing application codes and provide guidance for future code development.

The IPCC-WACS project is funded by a corporate gift from Intel Corporation. The WACA project is funded by NVIDIA grant CSL16483. Additional optimization efforts within are supported by NSF Core funds.