Research Data Archive

RDA metrics
These charts show the data access and growth metrics for the RDA during FY2006-FY2015. a) The number of unique RDA users specified by access pathway: the NCAR HPSS, publicly available web servers, one-time special requests (orders) prepared for individual users, and TIGGE. b) The amount of data delivered to customers, by access pathway. c) The amount of data in the HPSS archive, showing annual growth and not including backups. d) The amount of data on public web servers, showing annual growth. Charts a) and b) indicate the RDA’s significance to the community. Charts c) and d) show the annual progress toward building more valued content into the RDA.

The Research Data Archive (RDA) is a key part of CISL’s computing imperative for data curation and provision. It provides a rich information resource through a large and growing collection of datasets that support scientific studies in climate, weather, Earth System modeling, and increasingly, other related sciences. The RDA is developed to serve the research needs at NCAR and in the associated UCAR community, but since it is an open resource, the global community also uses it. RDA activities can be viewed from two different perspectives: user data access and archive content development, both of which are equally important in supporting research and education.

In FY2015, over 12,000 unique persons were provided about 1.5 petabytes of data through various primary access pathways: the NCAR HPSS, public servers on the web, one-time special requests prepared for individuals, and the THORPEX (THe Observing system Research and Predictability EXperiment) Interactive Grand Global Ensemble (TIGGE) archive (see charts a and b). The TIGGE project stopped active data collection during FY2015 and now has minimal impact on these metrics. The number of unique users increased steadily from 2012 through 2015. One-time requests (subsetting, format conversion, and HPSS file restaging to disk) and full file downloads increased.

CISL is making it easier for users to access terabyte-sized archives on their own. Orders were automatically prepared for over 4,300 individuals, and they received over 500 terabytes of data. Web users form the largest group, with 7,500 people downloading over 1,000 terabytes (1.0 petabytes) of data. There are fewer users of the HPSS (71 requesting 21 terabytes) and TIGGE (16 requesting 3 terabytes) services. The newest and most-used RDA collections are directly available from NCAR’s Globally Accessible Data Environment (GLADE) to the HPC environment. We currently cannot estimate the metrics for this pathway, but it is substantial because the access from the HPSS (tape-based) has dropped, and anecdotally, our local users are pleased. These metrics indicate that the RDA is an important growing data resource for a broad community.

The RDA content expanded by over 200 terabytes in FY2015 (see charts c and d). The complete RDA is now over 2.1 petabytes, and over 550 terabytes of it is readily available via GLADE (chart d). NCAR users can access the portion of the RDA not available on GLADE directly from the HPSS, and the Data Support Section provides automated procedures to assist outside users with data access from HPSS.

The RDA is constantly changing. Curation extends and adds to existing datasets, and stewardship improves the documentation, creates systematic organization, applies data quality assurance, and develops user access. Many routine tasks and background infrastructure developments are necessary to maintain the RDA. Major accomplishments for FY2015 include:

  • Added Globus access as an RDA service, including for both static datasets and one-off orders.

  • Expanded automated systems that use CISL HPC and GLADE to give users better access to terabyte-sized datasets. More than 41,000 individual data requests were processed.

  • Added significant data assets to the RDA:

    • The International Surface Pressure Databank version 3

    • Japanese Reanalysis 55 year, Atmospheric Model Intercomparison Project

    • Japanese Reanalysis 55 year, Conventional Data Only

    • NCEP Final 0.25-Degree Global Tropospheric Analyses and Forecast Grids

    • Final 30km Arctic System Reanalysis

  • Expanded Thematic Realtime Environmental Distributed Data Services (THREDDS) Data Server (TDS) to over 35 popular GRIB-formatted datasets, creating metadata and data access for scientific tools using standard interoperable protocols such as Open-source Project for a Network Data Access Protocol (OPeNDAP).

  • Expanded HPC-driven spatial, temporal, and parameter subsetting with data format conversion options to 55 datasets.

  • Increased formal data citation potential by assigning and maintaining DOIs on 55 RDA datasets.

The RDA is nationally and internationally respected for its staff, data management practices, consulting services, and ability to positively affect outcomes in the data arena. This position is advantageous to building collaborations that continually strive to provide better scientific data resources and access.

RDA maintenance and development within CISL are almost entirely supported by NSF Core funding. A small NASA grant supplemented development of ICOADS.