Science gateway services

Science gateway highlights
Shown above are two prominent science gateway services operated by CISL. These systems provide access to shared data management cyberinfrastructure for diverse scientific communities from petascale “big head science” to the “long tail” of small individual investigator-based projects. Combined, these services support over 1,250 active users monthly with annual downloads of over 2 petabytes.

CISL builds and operates science gateways that provide sustainable access to shared cyberinfrastructure for diverse scientific communities. Our projects span climate science, regional climate research, arctic science, solar science, digital preservation, and international efforts to develop metadata and knowledge infrastructure. Many of these efforts are tied to major interagency, national, and international initiatives, including the Intergovernmental Panel on Climate Change (IPCC), the International Polar Year (IPY), the World Climate Research Program (WCRP), and the Library of Congress’ National Digital Information and Infrastructure Preservation Program (NDIIPP). Most of these projects use open source, web portal infrastructure based on the Science Gateway Framework (SGF). CISL’s contributions to this suite of science gateway services is supported through NSF Core funding and augmented by special funding as noted below.

Our contributions to science gateways support CISL’s computing imperative for software cyberinfrastructure by maintaining, operating, and supporting software specific to the simulation, analysis, and forecasting needs of the atmospheric and related sciences. They also address CISL’s computing frontier for center virtualization by operating science gateways and other technologies that provide critical cyberinfrastructure (CI) to broad communities. Finally, operational services provided for the NCAR Earth System Grid gateway (ESG-NCAR), ACADIS, WMO, and other collaborations address CISL’s strategic action item to meet the challenges posed by large and heterogeneous environmental data, and to establish metadata standards for diverse collections of data and models.

Detailed updates on our portfolio of Science Gateway services follow:

Earth System Grid Gateway at NCAR (ESG-NCAR)

CISL operates the ESG-NCAR gateway that provides data discovery and access services for global and regional climate model data, knowledge, and software. The ESG-NCAR gateway participates in the Earth System Grid Federation (ESGF), which is a globally distributed petascale data management environment for CMIP5/IPCC-AR5 and U.S. climate science. The ESG-NCAR gateway supports community access to data products from many of NCAR’s community modeling efforts, including IPCC, PCM, AMPS, CESM, NARCCAP, and NMME data products. The ESG-NCAR gateway is heavily used by over 1,200 users monthly and delivers over 30 terabytes monthly of scientific data to the community.

In FY2015, the ESG-NCAR gateway capabilities were extended primarily to support simpler end-user data product access and address the needs of increased data publication volume. CISL also provided considerable end-user support through the ESG-NCAR help desk, answering over 300 end-user inquiries. We moved to a more continuous software delivery process, releasing and deploying updated versions every two weeks, bringing end-user value early and often via our Agile Scrum software development process. Other FY2015 accomplishments include refining data provider workflows; publishing performance enhancements, tools, and services for DOI assignment; an updated user interface framework; simplified security workflow; and open data access.

CISL continued to work closely with our community of data managers to process and publish data products from AMPS, CESM, CCSM4, NARCCAP, and NMME projects. Over 725 terabytes were published to ESG-NCAR during FY2015, raising the full volume of ESG-NCAR to 4.3 petabytes and 6.4 million files.

NSF Core funds support the operational ESG-NCAR gateway as well as special funding from the National Multi Model Ensemble (NMME) and High Impact Weather Prediction (HIWPP) projects.

Advanced Cooperative Arctic Data and Information Service (ACADIS)

ACADIS is a collaboration between CISL and NCAR’s Earth Observing Laboratory, the National Snow and Ice Data Center, and Unidata. ACADIS is a community data service that provides project data management planning, data archival, preservation, and access for all projects funded by NSF’s Arctic Science Program (ARC). CISL’s contributions to ACADIS include the ACADIS gateway, which provides an end-to-end service where NSF-supported data providers can publish their data collections and make them available to the broad community of researchers.

Accomplishments in FY2015 include expanding REST-based data management APIs, adding tools for DOI assignment, and extending ISO-19115 metadata record support. The data provider workflow was significantly enhanced based on end-user feedback and usability studies to provide easier and faster metadata authoring, bulk file upload, and efficient creation of metadata-rich records. An automated archive export and storage process was developed to store a copy of the repository data and metadata in the Amazon S3 service. A Cloud Service-based Gateway and services were deployed and tested to assess the costs associated with potential future cloud-based operations. The ACADIS gateway supports a community of over 200 principal investigators and receives an average of 50 provider-self-published datasets monthly.

The ACADIS project is supported by NSF Core and NSF Special funds.

Community Data Portal (CDP)

The CDP offers a broad range of scientific data collections that includes observations, climate, atmospheric chemistry, space weather, field programs, models, analyses, and more. Many programs and projects at NCAR, UCAR, and UCAR Community Programs (UCP) are represented in the portal. CDP provides a self-publishing model that offers data management tools directly to projects and PIs. Roughly 2,200 registered CDP users are discovering, accessing, and using 8,000 collections representing over 6.5 terabytes of managed data holdings. Data discovery is enhanced worldwide by automatically sharing these metadata with other portals and international centers.

In FY2015 we developed a plan for replacing the CDP services with an open source solution that has a lower maintenance cost. This work was based on input from the nearly 50 active CDP data providers and the NCAR Data Stewardship Engineering Team (DSET). In FY2015 we continued to provide operational support, security upgrades, and critical bug fixes for the CDP services.

CDP is supported by NSF Core funding.

Chronopolis: Federated Digital Preservation over Space and Time

There is a critical and growing need to organize, preserve, and make accessible the increasing number of digital holdings that represent vital intellectual capital, much of which is precious and irreplaceable. Chronopolis is a strategic collaboration among the San Diego Supercomputing Center (SDSC, lead organization), NCAR/CISL, the University of California Library System, and the University of Maryland. It is aimed at developing national-scale digital preservation infrastructure that has the potential to broadly serve any community with digital assets for science, engineering, humanities, and more. In addition to community collections, Chronopolis CI is being used to provide digital preservation services for the ACADIS project.

In FY2015, CISL replaced our Chronopolis production node with a new 325 TB storage system and related server and software services. CISL developed a new web-based dashboard tool for system monitoring and federation-wide reporting and capacity planning. CISL continued to provide operational support of the NCAR storage node which currently manages 25 terabytes and over 2.3 million managed objects.

This gateway data preservation service is supported by the Chronopolis project.