CISL’s Big Data services and software tools

CISL provides our research community with Big Data tools and services for locating, accessing, and analyzing a variety of observational and model research data collections. These data are served through data gateways over high-speed wide-area networks and are also accessible from disk and tape storage on the Yellowstone computing complex. These tools and services combine to support our communities’ efforts to extract scientific knowledge from the petabytes of data available on NCAR’s cyberinfrastructure. These tools and services include:

  • Research Data Archive (RDA) – The climate and weather research communities’ data needs continue to grow, so CISL adds new content and access features to the RDA. More than 11,000 unique users acquire 1.1 petabytes of data yearly through the RDA web portal. In addition, hundreds of internal users access substantial amounts of data directly from NCAR’s Globally Accessible Data Environment (GLADE).

  • Data Gateways – Data gateways expand scientific collaboration by connecting research communities and new climate data consumers with data products and tools. The Science Gateway Framework (SGF) is a unified portal for scientific data users, and it helps researchers use new supercomputing environments. The SGF underlies CISL’s NCAR Earth System Grid, the ACADIS Arctic Data Repository, and the Community Data Portal.

  • Data Assimilation Research Testbed (DART) software – DART supports community researchers and improves their prediction skill for and understanding of the Earth System by collaboratively developing and applying data assimilation methods across a wide range of geophysical problems.

  • Data Analysis Tools – CISL’s portfolio of data analysis tools provides an ever-growing community of scientists with unique capabilities tailored to the disciplines we serve. The scalability and performance of these tools are increasingly important in the era of Big Data. The Visualization and Analysis Platform for Ocean, Atmosphere, and Solar Research (VAPOR) offers the capability to efficiently explore enormous or complex 3D data sets. The NCAR Command Language (NCL) is an open source scripting language for geoscientific data analysis and visualization. NCL reads and writes several geoscientific data formats and creates publication-quality graphics. PyNIO and PyNGL are Python modules built on top of NCL’s component libraries, providing Python users with the same file I/O and visualization capabilities as NCL. PyAOS is an atmospheric and oceanic-based computational library with contributing partners from universities, national laboratories, and commercial enterprises.

  • NCAR Data Sharing Service via Globus Plus – NCAR’s Data Sharing Service uses Globus Plus to augment CISL’s existing Globus-based data transfer services with user-managed big-data sharing capabilities.

Through parallelism, end-to-end workflows employing these tools and services are used by scientists to produce results more quickly and to a broader audience of researchers.

The funding for each of these efforts is specified in the sections below.