NWSC data-intensive computing environment

The NWSC HPC environment includes petascale high-performance computing (HPC) resources, the Globally Accessible Data Environment’s centralized file system and data storage resource, and data analysis and visualization resources. The systems that were initially deployed within the NWSC in late 2012 have now been in production use for nearly three years, having enabled new science and discovery in the atmospheric and related sciences.

NWSC productivity
Monthly profile of the number of jobs and core-hours delivered by the NWSC HPC systems during FY2015. (This plot was produced by Open XDMoD, authored by the University of Buffalo under NSF grants ACI 1025159 and ACI 1445806 for the development of technology audit service for XSEDE.)

FY2015 was again a highly productive year for the NWSC data-intensive computing environment. Over the past year these systems have supported 555 million core-hours of computing for over 7.3 million jobs.

GLADE

The centerpiece of NWSC’s data-centric supercomputing environment is NCAR’s Globally Accessible Data Environment (GLADE), which provides a shared, high-speed (90 GB/second), high-capacity (16.4 PB) central file system connecting all the computing and support systems required for scientific computation and associated workflows. This centralized design, independent of the HPC resources, improves scientific productivity and reduces costs by eliminating the expense (time and energy) of moving and maintaining multiple copies of data.

Yellowstone

The Yellowstone HPC resource, with peak processing power of 1.5 PFLOPS (1.5 quadrillion floating point operations per second) and 72,576 Intel Sandy Bridge (E5-2670) processing cores, was ranked as the 13th most powerful supercomputer in the world when it was installed. It dropped to 29th place in June 2014, and it was ranked 49th on the June 2015 TOP500 list. Yellowstone’s design and configuration target the data-intensive computing needs of the Earth System sciences, disciplines that push the limits of computational and data systems.

Geyser, Caldera, and Pronghorn

Rounding out the resources of the NWSC’s HPC environment are the Data Analysis and Visualization (DAV) systems Geyser, Caldera, and Pronghorn, the first two of which are specifically configured for DAV tasks and equipped with NVIDIA graphics processing units (GPUs). The 16-node Geyser cluster, with 1 terabyte of memory and a single NVIDIA K5000 GPU per node, was designed for data synthesis, analysis, and visualization tasks; while the 16-node Caldera cluster, with two NVIDIA K20X GPGPUs per node, was designed for computationally intensive, GPGPU-accelerated parallel applications and data analysis tasks. Pronghorn was initially an Intel Phi accelerator evaluation system, and after decommissioning the Phi adapters, has been repurposed to augment the Caldera system, but without computational accelerators.

Data sharing services

Launched last year, CISL continued to operate the the NCAR Data Sharing Service during FY2015. Based on the Globus Plus software (a tool that emerged from a partnership with the University of Chicago and Argonne National Laboratory), the NCAR Data Sharing Service provides researchers a way to share large data sets with collaborators around the world using a simple web-based interface. The service provides 1.5 PB of storage, data movement servers, and high-speed network connectivity to external research networks.

CISL’s commitment to a data-intensive computing strategy extends beyond the Yellowstone environment and includes a full suite of science gateway and data portal services. CISL continues to lead the community in developing data services that can address the future challenges of data growth, preservation, curation, and management. CISL also leads in supporting NSF’s requirement for data management plans. Our disk and tape-based HPSS archival storage systems provides an efficient, safe, and reliable environment for long-term offline hosting of datasets, yet provides user-friendly interfaces for quickly retrieving stored data. CISL has streamlined and improved its data services through the data-centric design of the NWSC environment, and particularly via the GLADE file systems.

Erebus/AMPS

During FY2015, CISL continued to operate the 84-node supercomputing cluster Erebus that is based on the same architecture as Yellowstone. Erebus delivered 8.9 million core-hours during FY2015 and is used exclusively by the Antarctic Mesoscale Prediction System (AMPS) for producing twice-daily numerical weather predictions over the Antarctic continent. Primary users of these simulations are forecasters who support the U.S. Antarctic Program flight operations and polar observatory, and to support research and education activities involving Antarctic meteorology.

NWSC-2 procurement

This past year UCAR, on behalf of CISL, issued the NWSC-2 RFP for the acquisition of a new HPC resource and augmentation of the GLADE environment. With input from internal and external advisory teams, CISL established a set of requirements for the new systems and benchmarks for their evaluation. The RFP was issued in April 2015, and proposal evaluation was concluded at end-FY2015. While formal announcement of the award will occur early in FY2016, the HPC system to be acquired that will ultimately replace Yellowstone will have a peak computational rate exceeding 5 PFLOPS, and the storage system, which will enhance GLADE, will have an aggregate I/O bandwidth of 200 GB/second, an initial storage capacity of 20 PB, and will be expandable to beyond 40 PB (thus potentially expanding GLADE’s total capacity to over 56 PBy). The NWSC-2 HPC and storage resources are planned to be deployed for production use by the end of calendar 2016. CISL plans to operate Yellowstone through calendar 2017, thus providing a one-year overlap with the new NWSC-2 HPC system.

Funding

The NWSC environment, including HPC, GLADE, and DAV resources, was made possible through NSF Core funds, with supplemental support from the University of Wyoming. AMPS computing was supported by NSF Special funding.