Data analysis and visualization environment

Hurricane Sandy simulation
This unprecedentedly high-resolution WRF simulation of Hurricane Sandy was created on CISL’s Geyser cluster with CISL’s VAPOR software. As shown in the figure, this 500-meter simulation revealed features previously unseen in other coastal cyclones, such as a surface-based warm jet that propagates as fast as wind speed. Producing hundreds of terabytes of data, this simulation demanded substantial DAV resources for analysis.

The Data Analysis and Visualization (DAV) environment enables scientific workflows by providing UCAR’s research community with state-of-the-art systems tailored for the specialized needs of parallel data post-processing, analysis, and visualization. This environment also supports the research community by developing algorithms for relevant visualization and analysis methods and by producing animations and imagery in collaboration with and on behalf of scientific staff.

NCAR’s DAV environment consists of two clusters each designed to complement the other in meeting the diverse needs of climate and weather DAV applications. Caldera is targeted for use by general-purpose applications that can be accelerated via high-performance graphics processing units (GPUs) and parallel graphics/visualization applications. Geyser is targeted primarily at traditional interactive dataset manipulation, reduction, analysis, and visualization applications and for large, data-intensive applications requiring GPUs and/or large shared memory. Both systems share a dedicated, high-bandwidth I/O network path to Yellowstone’s filesystems on the GLobally Accessible Data Environment (GLADE). Caldera and Geyser are also used extensively for production and on-demand regridding, data subsetting, and curation of NCAR’s Research Data Archive (RDA) sholdings.

Caldera is a 16-node cluster comprised of IBM dx360 M4 nodes, identical to the Yellowstone compute nodes except that they are augmented with two general-purpose graphics processing units (GPGPUs). Each Caldera node contains two 8-core Intel Sandy Bridge processors, 64 GB of memory, and two NVIDIA Tesla K20X accelerators. Each K20X accelerator is capable of 1.31 TFLOPS double-precision calculations or 3.95 TFLOPS single-precision calculations, giving Caldera a peak double-precision floating point rate of over 47 TFLOPS. The same peak computation rate requires over 140 Yellowstone nodes.

Geyser is a 16-node cluster comprised of IBM x3850 X5 nodes, each equipped with four 10-core Intel Westmere processors, one terabyte of memory, and one NVIDIA Quadro K5000 graphics adapter. The K5000 accelerator is designed for high-speed graphics rendering, with a single-precision floating point rate of 2.1 TFLOPS.

Additional details of the Geyser and Caldera systems are contained in a table in the LINKProduction supercomputing status section of this annual report.

In addition to supporting CISL’s computing imperative for hardware cyberinfrastructure (CI), the DAV environment supports CISL’s software CI computing imperative by supporting, developing, and enhancing software specific to the simulation, analysis, and forecasting needs of the atmospheric and related sciences. Furthermore, the DAV resources help to advance CISL’s science frontier in understanding large and heterogeneous data sets by developing new methods and tools such as VAPOR to extract and visualize information from such data sets.

CISL currently plans to continue operating Geyser and Caldera through calendar year 2017.

NCAR’s DAV environment and services are supported by NSF Core funds including CSL funding.