Provide centralized high-speed data storage

Enclosure for GLADE storage system
NWSC-2 DDN Storage System providing 38 PB of usable storage at 280 GB/s.

The GLobally Accessible Data Environment (GLADE) provides centralized high-performance file systems spanning supercomputing, data post-processing, data analysis, visualization, and HPC-based data transfer services. GLADE provides computation, analysis, and visualization work spaces common to all CISL HPC resources. Project space is allocated through NCAR’s allocation panels, while scratch and user space is available to all users of NCAR HPC resources. GLADE also hosts data from NCAR’s Research Data Archive (RDA), NCAR’s Community Data Portal, and the Earth System Grid which curates data collections for CMIP5/AR5 and (soon) CMIP6. (CMIP is the Coupled Model Intercomparison Project, and Phase 5 – CMIP5 – was completed in 2014. AR5 is the fifth Assessment Report of the Intergovernmental Panel on Climate Change – IPCC.)

GLADE’s architecture shifts user workflows from a design that centers on serving the supercomputer to a more scientifically efficient design that facilitates the flow of data. Through a globally accessible storage infrastructure, users now arrange their workflows to use stored data directly without first needing to move or copy it. Additional services like high-performance data transfer protocols enhance CISL’s ability to bring computational data from other sites to NCAR for post-processing, analysis, and visualization.

This work supports CISL’s computing imperative for hardware cyberinfrastructure by providing storage and networking systems customized to support efficient workflows for the atmospheric and related sciences. Specifically, GLADE facilitates typical user workflows plus special efforts like supporting data flows for CMIP5 and the upcoming CMIP6. GLADE also advances CISL’s computing imperative for facilities by demonstrating high-performance data services that are critical for the supercomputing resources that now operate at NWSC and will continue to be critical as next-generation resources are added to the environment.

In FY2017, CISL installed the NWSC-2 resources that include a new storage system with an initial usable capacity of 20 PB. The system was fully integrated with the NWSC-2 computational system, and along with the network enhancements from FY2016 now provides storage resources to all NWSC systems. In spring FY2017, an additional 20 PB of capacity was added to the storage system for a total of 38 PB usable capacity and bandwidth exceeding 280 GB/s.

As the size of data produced from scientific applications grows, so will user expectations for real-time data access. Therefore we have to push toward innovative and emerging technologies and concepts such as in-memory (non-volatile memory or NVM) and on-flash data solutions. The decreasing cost and continuing innovation of these technologies will very soon make this an increasingly attractive option. CISL is planning to deploy about 400 TB of flash storage later this year to conduct a proof-of-concept study on selected use cases and move the concept into production in FY2018.

NCAR’s storage, file systems, and data infrastructure are managed by CISL under the UCAR/NSF Cooperative Agreement and are supported by NSF Core funds.