Optimizing model performance on NCAR supercomputers

In recent years, the amount of performance that can be extracted from supercomputers through software optimization has become at least as important as that coming from hardware improvements. Significant factors driving this trend include the stagnation or even reduction of the speed of a single thread of execution, the aggressive introduction of vector/SIMD instruction sets, the increased core count per processor socket which requires careful parallel programming to properly utilize, and the introduction of heterogeneous architectures composed of both conventional processors and accelerator coprocessors.

In FY2015 CISL continued to augment its efforts to optimize NCAR codes, focusing first on NCAR’s flagship community models. This strategic optimization thrust is two-pronged, with one effort (called SPOC) aimed at optimizations of current model code bases for Yellowstone-like systems (i.e., conventional multi-core processors) and a second (called IPCC-WACS) housed in TDD’s ASAP group focused on the future challenges of the accelerator space. The SPOC effort is described below, and the IPCC-WACS effort is described in the section titled Evaluating many-core and accelerator-based architectures.

Strategic Parallel and Optimization Computing (SPOC) initiative

In FY2015, CISL’s Strategic Parallel and Optimization Computing (SPOC) initiative continue its NCAR-wide efforts to increase the performance and efficiency of NCAR’s community codes—CESM, WRF, and MPAS—on Yellowstone. In addition to benefits on current Yellowstone hardware, the SPOC efforts are targeting code optimizations that are expected to translate to performance benefits on future processor architectures. In addition to support within the Consulting Services Group (CSG), CISL identified additional resources for this work and embedded them directly with the model development teams. Key activities this year include:

  • CESM performance was targeted on a number of fronts. A CSG staff member worked to make the CESM internal performance timing infrastructure consistent across models, essential groundwork for identifying future optimization targets and for better understanding the benefits of performance improvements. CISL and CGD staff also supervised a summer intern from Louisiana State University, who focused on parallel I/O optimizations in the POP ocean model.

  • SPOC supported an external optimization expert who worked with the MPAS development team on the MPAS dynamical core. Specifically, through changes to only about 200 lines of code and experimentation with Intel compiler options, this work resulted in 15% to 20% improvement in the dynamics, which translates to an 10% to 15% improvement for MPAS overall.

  • Work with the WRF team involved two SIParCS interns who identified several optimization opportunities that potentially represent a 15% performance improvement for WRF and 25% for WRF-Chem. The students also began exploring performance optimization possibilities for running WRF on Xeon Phis and demonstrated 50% performance improvement in a carefully crafted, idealized WRF case, run in symmetric host-Phi mode. The students have been hired by CSG as student assistants to continue pursuing WRF optimizations.

  • CSG staff investigated potential system software and architectural changes. A comparison of different MPI libraries suggested possible performance improvements, and CSG continues efforts to understand the usability and performance benefits for the full CESM suite. An InfiniBand topology study was conducted on Yellowstone and a number of external HPC systems to quantify performance tradeoffs for NCAR’s flagship models and inform the NWSC-2 procurement process.

  • Training has also been identified as a key contribution from the SPOC initiative toward building the relevant skills in the NCAR developer community. To that end, SPOC-supported staff conducted workshops on finding hotspots and bottlenecks in code and an introduction to performance tuning and optimization. CISL also hosted vendor-led training events by Intel, on their analysis tools and compilers, and by Allinea, with an overview of their debugging and profiling tools.

The SPOC initiative is supported by NSF Core funds.