Foster research and technical collaborations

CISL has a robust set of ongoing partnerships and collaborations that are focused on the effective use of current and future high performance architectures for NCAR applications. These collaborations take the form of membership in a regional HPC consortium, on-going R&D projects that include vendor partners, annual workshops, symposia, hackathons, training events focused on emerging technologies and techniques, as well as regularly scheduled teleconferences on code optimization with various vendor partners.

Rocky Mountain Advanced Computing Consortium

CISL’s participation in the Rocky Mountain Advanced Computing Consortium (RMACC) not only supports the development of regional high-performance CI but also broadens and informs CISL’s knowledge of various computing options. For instance, seeking to replace the aging Janus supercomputer, the University of Colorado at Boulder and Colorado State University led, in partnership with RMACC, a successful NSF Major Research Instrumentation grant proposal to bring a 450-teraflops PowerEdge C Series system from Dell to the Front Range. Named Summit, this heterogeneous computing system includes Intel Xeon, Xeon Phi, and NVIDIA GPU components, along with the newly introduced 100 gigabit-per-second Intel Omnipath (OPA) Interconnect. CISL contributed to this successful proposal as an RMACC partner, and is therefore entitled to access to the system for benchmarking and evaluation purposes. This access to Summit’s novel heterogeneous architecture provides CISL an important avenue to gain vital technical information about the performance and production readiness of both OPA and Xeon Phi.

In FY2016, the planned refurbishment of the Mesa Laboratory Computing Facility required the de-installation of the Colorado School of Mines (CSM) IBM supercomputer named “BlueM.” In accordance with the contractual arrangement associated with the colocation agreement, CSM was given ample advanced notice, and CISL and CSM subsequently worked together to develop a mutually agreeable plan by which BlueM was relocated to a new computing facility located at the National Renewable Energy Laboratory (NREL) in Golden, Colorado. The deinstallation also ended the joint computational science research project related to this novel hybrid computing system that combined IBM’s iDataPlex and Blue Gene/Q platforms. This relationship afforded CISL the opportunity to gain further experience with both highly parallel BlueGene-style computers and heterogeneous systems.

UCAR Indirect funds supported the operating costs of the colocated BlueM computer system. The minimal costs of CISL’s participation in RMACC education and outreach activities are entirely supported by NSF Core funds.

Vendor partnerships

CISL maintains a wide spectrum of vendor research and development partnership activities. These include active collaborations with Intel (IPCC-WACS), NVIDIA (WACA), SGI (JCoE), and Cirrascale (GX8). CISL uses the HPC Futures Lab facility and support from CISL's Supercomputing Services Group to enable the gear required to perform these R&D efforts.

IPCC-WACS: In 2016, CISL and the University of Colorado at Boulder (CU) continued their Intel-funded collaboration as the Intel Parallel Computing Center for Weather and Climate Simulation (IPCC-WACS). This collaborative center promotes the discovery of new methods for optimizing the performance of weather and climate models on Intel Xeon and Xeon Phi hardware and accelerates the adoption of these optimizations back into key weather and climate community models. IPCC-WACS also has a student education and training component being led by CU.

The Intel gift has enabled CISL to develop the Kernel Generator (KGEN), a labor- and resource-saving tool for automatically extracting part of a large modeling code base and creating a kernel or unit test around it for optimization and subsequent verification. The KGEN tool is also being used and evaluated by engineers and scientists at other research institutions, including ETH and GFDL, thus providing significant broader impacts to the atmospheric science community. The value of these impacts was recognized by Intel, who provided additional funding in early 2016 after reviewing the progress and accomplishments of the IPCC-WACS team.

IPCC-WACS supported optimization work focused on the spectral element dynamical core in CESM, called HOMME, as well as key, expensive portions of the CAM physics packages such as cloud microphysics, convection, and radiation physics. Many of these optimizations were incorporated into the source code base for CESM.

The NCAR/CISL portion of the IPCC-WACS project is funded by Intel Corporation through the mechanism of a corporate gift.

WACA: In FY 2016, CISL also initiated a new partnership with NVIDIA Corporation and the GPU Research Center (GRC) at the University of Wyoming. This partnership is called the Weather And Climate Alliance (WACA). CISL’s initial objective in WACA is to work with the partners to port the Model for Prediction Across Scales (MPAS) dry dynamical core to multiple GPUs using the OpenACC directive-based paradigm. By using OpenACC, the new version of the MPAS code will be able to use both conventional microprocessors and GPUs. WACA will enable students at the University of Wyoming to gain valuable hands-on experience with HPC application software and optimization techniques. As an inaugural WACA activity, NVIDIA and CISL staff hosted an OpenACC Hackathon in June 2016 that provided hand-on experience with application porting using Portland Group OpenACC directives. The Hackathon was attended by a dozen students and NCAR staff.

WACA is partially funded by NVIDIA Corporation and partially by NSF Core funds.

Cirrascale GX8: A fundamental question in computational science at extreme scale is “How will we program an exascale system?” With each passing year, the design for the exascale computational building block is gradually becoming clearer. Because of the size and complexity of atmospheric and Earth System science applications, it is necessary to begin evaluating these designs far in advance. So in FY 2016, CISL completed a careful review of existing proxies for an “exascale computational building block” that might be purchased with the objective of informing application design. CISL acquired the Cirrascale GX8 dense-GPU solution that puts O(10) teraflops in a 4U chassis with a scalable PCI-switch interconnect. Starting in summer 2016, this GX8 system has formed the basis of three different research projects focusing on different 2D atmospheric PDE solvers: MPAS Finite Volume approach, the Discontinuous Galerkin method, and Radial Basis Function Finite Difference (RBF-FD) methods. Using the RBF-FD method, the GX8 system delivered 0.8 teraflops sustained across eight, K40 NVIDIA Tesla GPUs, a throughput that is equivalent to the performance of about 40 Yellowstone nodes.

Funding for the Cirrascale GX8 comes via NSF Core funds, jointly supplied by CISL’s Technology Development Division and Operations and Services Division.

NCAR-SGI Joint Center of Excellence: Silicon Graphics’ successful proposal response to the NWSC-2 RFP included the suggestion to create a “Joint Center of Excellence” (JCoE) between NCAR and SGI. After creating an MOU for the JCoE, regular monthly meetings have begun to develop joint R&D activities and to provide a forum for identifying and discussing strategically important issues of mutual concern.

One significant project immediately emerged around the area of workflow acceleration using the SGI UV 300 platform. The UV-300 is a large shared-memory architecture designed to handle large-scale, data-intensive analytics workflows. CISL’s Application Scalability and Performance Group worked with SGI engineers to evaluate climate workflow benchmarks on UV 300 systems containing NAND-based SSD storage. The use of SSD storage as a high-speed cache enabled these climate data post-processing workflows to speed up by factors of two to five at both high and low resolutions. A subsequent loaner system was embedded in the “Laramie” test system at NWSC and confirmed the results of the tests performed by SGI in Chippewa Falls. This type of collaboration fostered by the JCoE partnership helps NCAR in designing future analysis systems that can process large data sets more efficiently. In turn this data helps SGI evaluate future system design concepts for the wider HPC marketplace. Working through the JCoE has thus enabled mutually beneficial outcomes for NCAR and SGI.

The JCoE is based on cosponsored staff time, which is provided on NCAR’s side from NSF Core funds.

HPC Futures Lab

CISL’s HPC Futures Lab (HPCFL) and CISL's Supercomputing Services Group play a critical role in supporting technology evaluation activities required by computational science research and development. For example, the Cirrascale GX8 evaluation activity cited here is entirely enabled by support received through the HPCFL and SSG teams.

Funding for HPCFL comes via NSF Core funds, jointly supplied by CISL’s Technology Development Division and Operations and Services Division.