Data

Provide Comprehensive Data Services and Long-Term Data Stewardship

Data has been and will continue to be the ultimate product from field campaigns EOL supports. As such, the data we provide must be of high quality and well managed and preserved. Furthermore, in view of the President’s 2013 Open Data Executive Order for public access to data and NSF’s increasing emphasis on multi-disciplinary science, EOL must ensure that its current and historical data is well documented, discoverable, and feed seamlessly into scientific analysis workflows. Imperative 3 describes our activities to meet these challenges, which are divided into three key areas: 1) acquisition, quality control, and data management; 2) standardization of data formats and distribution; and 3) data citation and metrics.

Sustain Efficient Acquisition, Quality Control, and Data Management 

NCAR Digital Asset Services Hub (DASH) Data Management Services

The NCAR Digital Asset Services Hub (DASH) system is being developed by NCAR/CISL under the guidance of the NCAR Directorate and the Data Stewardship Engineering Team (DSET). The DASH portal will provide single-point access for search and discovery of all NCAR assets (e.g., data, software, publications/documents).  For labs with existing data management systems like EOL's EMDAC, DASH will harvest metadata (not data) to support the search and discovery. Once a dataset is discovered via DASH, users are redirected to the appropriate lab’s existing system to place a data order.  Additional services such as supporting Data Management Plan development will also be provided and/or are accessible via the DASH web site.  

Data Legacy Work

Bringing legacy datasets up to current data archive standards is an ongoing task and involves metadata cleanup and the assignment of Digital Object Identifiers (DOIs) to legacy EOL datasets. The assignment of DOIs to all EOL datasets going back 10 years has been completed, and work continues on ISF instrumentation and sounding systems to go back even farther in time. Metadata cleanup involves the application of more standard metadata vocabularies and keywords. This work is likely to go on for several years.  

EOL/DMS staff published a journal article on the assignment of DOIs to datasets:

Aquino, J. A., J. J. Allison, R. A. Rilling, D. Stott, K. Young, and M. D. Daniels, 2017: Motivation and strategies for implementing Digital Object Identifiers (DOIs) at NCAR’s Earth Observing Laboratory -- Past progress and future collaborations. Data Science Journal, 16, 7, doi:10.5334/dsj-2017-007.

Standardize Data Formats and Distribution 

High Performance Storage System (HPSS) Backup

The objectives of the HPSS backup project are to:

  • Form a consolidated set of EOL data on the HPSS (enabling CISL to perform future migration of EOL data),
  • Create a full, disaster recovery, geographically separated backup of EOL HPSS files,
  • Create an online disk copy of EOL's data on CISL's "GLADE" system.

 

The High Performance Storage System (HPSS)
The High Performance Storage System (HPSS)

In FY 2017, this work proceeded in two stages:  1) tje transfer of all EOL platform and cooperator data sets followed by 2) the transfer of operational data sets. To date, EOL data from 1971 through 2011 has been copied to the CISL GLADE system, with data through 2010 transferred to both the disaster recovery data area and the new HPSS files set.

The project is currently limited by available GLADE space, and the EOL team is working with CISL to resolve this issue.  The CISL team anticipates that additional disk space will become available once the Cheyenne HPC infrastructure is in place. Having a copy of our data online will significantly speed up access and allow for exploration of the data in ways that were not possible using the tape-based HPSS system.


Develop Data Workflows and Citation Metrics

EOL Data Archive Searchability

Local and outside users now have new ways to find data within EOL’s Data Archive.  This effort was an EOL pilot project made possible by the funding allocated by the NCAR Director’s Office to the Data Stewardship Engineering Team (DSET) activities.  The funding provided a way to make searching for data easier and more accessible.  Improvements include a sidebar with the capability to search through the data by project, category, platform, instrument, and NASA’s standard for science keyword searches - the Global Change Master Directory (GCMD).  According to NASA, these keywords are used to “facilitate the classification and discovery of Earth Science data by providing a rich vocabulary for characterizing the data” and are used by hundreds of data providers worldwide.  The newly added GCMD keywords are currently only available for EOL’s Arctic data holdings. All new data will be assigned a GCMD keyword. Users still have the ability to use the existing search capabilities within the EMDAC data ordering system. To try out the new search capabilities, see EOL’s data page.