Computational research and development

GPU-based acceleration
Speedup using GPUs to complete a standard spatial statistical analysis. This figure reports a timing study for a spatial analysis (i.e., Kriging) that involves fitting a Gaussian process covariance to observations and making a prediction of this surface for locations where the spatial field is not observed. The analysis is done in the R data analysis environment with the fields and RMAGMA packages. Here the times are relative to the fields standard function without using GPU support and are calculated on a laptop. As a reference, the time for the standard function at 10,000 observations is about 2-1/2 minutes. Thus a speedup of over 15 times would shift this standard data analysis time to an interactive activity (about 10 seconds). This difference in the amount of time to wait supports a more discovery-oriented approach, radically changing the dynamic of how data can be modeled and interpreted.

Meeting the grand challenges in simulating the Earth System requires more than just migrating standard algorithms to larger computational platforms. New hardware, new parallel computational approaches, taking advantage of coprocessors, and more efficient algorithms are all needed to reach the resolution and complexity levels necessary to support scientific breakthroughs in modeling. This attention is also required to address the analysis and manipulation of the large data sets now common in the geosciences.

The next three sections describe CISL’s efforts to accelerate NCAR software applications on existing as well as future hardware. In the past, application performance improvements came “automatically” – largely from advances in hardware performance. The last decade has seen the gradual end of this regime. Now the emphasis is on acceleration through increased parallelism. CISL research and development in this area has employed the following three strategies.

First, CISL has launched efforts to achieve acceleration through parallelism in NCAR’s computational models. This means developing tools and techniques for achieving efficiency at higher thread counts and vector length than previously required. The target here is emerging many-core architectures such as Intel’s Xeon Phi and NVIDIA’s Tesla GPU architectures. One result of this work was the production of a kernel generator for parts of the NCAR atmosphere model has allowed some key components to be optimized. Specifically the micro-physics computations, a significant percentage of the model runtime, has been reduced by half across several Intel Xeon processors.

Second, CISL’s acceleration efforts have focused on an end-to-end workflow approach. In the past, optimization efforts were focused on the models. However, as computational science has become more data-centric, attention must be given to executing analysis and post processing scripts. To test whether compression of model output degrades its scientific content, a test set of compressed output fields was created in FY2015 for feedback from the modeling community. These results will be evaluated and the compression schemes will be adjusted as needed.

Finally, CISL’s numerical experts and computer scientists are working with scientists in other NCAR laboratories to pioneer new numerical schemes and parallel algorithms to achieve algorithmic acceleration. Conceptually, algorithmic acceleration means achieving the same numerical accuracy in less time or using fewer cyber-resources. One example of this work is the development of a transport scheme for the cubed-sphere geometry that has high accuracy but still maintains positive concentrations. Moreover, this method only depends on neighboring elements and so does not degrade the parallelism in the other parts of the numerical procedures.

This work is supported by NSF Core funding, with supplemental funding supplied by other sources as noted in the following reports.