Provide data analysis and visualization resources

Caldera DAV computer
Caldera is a 30-node cluster comprised of IBM dx360 M4 nodes that are identical to Yellowstone’s compute nodes, except that 16 of the nodes are augmented with two computational accelerators, or general-purpose graphics processing units (GPGPUs). Each Caldera node contains two 8-core Intel Xeon E5-2670v2 (Sandy Bridge) processors and 64 GB of memory (4 GB/core, or twice that on Yellowstone). Sixteen of Caldera’s nodes each contain two NVIDIA Tesla K20X accelerators. Each K20X accelerator is capable of 1.31 teraflops of double-precision calculations or 3.95 teraflops of single-precision calculations. Caldera’s peak double-precision floating point rate is therefore nearly 52 teraflops. Yellowstone would need to use 155 of its nodes to produce the same peak computation rate.
Geyser DAV computer
Geyser is a 16-node cluster comprised of IBM x3850 X5 nodes that are each equipped with a terabyte of memory, four 10-core Intel Xeon E7-4870 (Westmere) processors, and one NVIDIA Quadro K5000 graphics adapter. The K5000 accelerator is designed for high-speed graphics rendering, with a single-precision floating point rate of 2.1 teraflops.

NCAR’s Data Analysis and Visualization (DAV) environment enables scientific workflows tailored for the various data processing stages of UCAR’s research community, while improving the scale and efficiency of our users’ workflows; especially as the volume of data being produced continues to grow. NCAR’s specialized DAV systems are architected for parallel data post-processing, data analysis and reduction, and data visualization.

CISL provides a portfolio of advanced computing and data services specifically tailored for the atmospheric, geoscience, and related sciences communities. While computing is a foundational element of scientific research, data and the infrastructure for storing it have become an equal, if not more important, resource. As data volumes grow, the cost of moving data around has become prohibitive. Thus, CISL inaugurated a data-centric architecture at the NWSC by implementing NCAR’s GLADE environment.

CISL’s DAV environment has direct, high-performance connectivity to GLADE’s high-performance file systems to provide the hardware and software resources for enhanced data analysis and visualization capabilities.

DAV hardware resources and metrics

NCAR’s DAV environment consists of a pair of multi-node systems deployed in late 2012. Each of these two systems is designed to complement the other in meeting the diverse needs of climate and weather applications’ post-processing and visualizations needs. The system named Caldera is targeted for use by parallel graphics/visualization applications and computationally bound applications that can be accelerated via multiple high-performance general-purpose graphics processing units (GPGPUs) as well as for general-purpose small parallel applications. The system named Geyser is primarily for traditional interactive data set manipulation, data reduction, analysis, and visualization applications, and for large, data-intensive applications requiring graphical processing units (GPUs) and/or large shared memory. These DAV systems share a dedicated, high-bandwidth I/O network path to NCAR’s GLobally Accessible Data Environment (GLADE) that eliminates the need for users to move their data from Cheyenne or Yellowstone for analysis. Caldera and Geyser are also used extensively for production and on-demand regridding, data subsetting, and curation of NCAR’s Research Data Archive (RDA) holdings.

Caldera and Geyser system specifications, along with their reliability and utilization, appear in the tables in the Production supercomputing section of this report.

Unlike Yellowstone, which had an average user utilization of over 90% during FY2017 in support of its long-running batch jobs, the DAV platforms are designed for interactive applications and rapid job turnaround. Therefore, their average utilization is typically low, with bursts of high utilization during prime work hours. During FY2017, Caldera’s average user utilization was 17.0%, while Geyser’s average user utilization was 35.2%. Geyser’s utilization has slowly and steadily increased over the last few years, while Caldera’s has leveled out. This highlights the value of nodes with large shared memory for data processing. CISL continues to monitor DAV workload and usage, and CISL’s observations help guide requirements for future DAV systems.

NWSC-2a procurement

Because these DAV systems are over four years old, CISL has released the NWSC-2a request for procurement (RFP) to acquire a next-generation production-quality DAV platform that will replace Geyser and Caldera. The NWSC-2a system will also include significant resources to facilitate research and incorporate machine-learning and deep-learning applications (ML/DL) into our workflow. ML/DL is being adopted not only for the voluminous information being collecting from instruments and sensors, and for pattern detection in the output of ensemble model runs, but also to aid in the post-processing and data analysis of the large volume of data produced by our supercomputers – all of which are critical to facilitate new scientific insights.

The NWSC-2a system will be configured primarily to satisfy the anticipated DAV and ML/DL workload requirements of the NCAR and university user communities in the 2018-2022 time frame. It is anticipated that the NWSC-2a system will include large shared-memory nodes, like Geyser’s, for in-memory data analysis, manipulation, and reduction; GPUs for visualization as well as computational acceleration and ML/DL applications; and non-volatile memory, SSD, and/or burst-buffer hardware and software for accelerating the I/O operations critical to the analysis of large data sets. As of end-FY2017, CISL expects the new NWSC-2a system to be placed into production in during spring 2018.

Funding

NCAR’s DAV environment and services are supported by NSF Core funding including CSL funds.