NCAR’s data-sharing service on Globus Plus

Monthly Globus usage
Globus Usage per month, showing the number of users from December 2011 to April 2015.

The GLobally Accessible Data Environment (GLADE) provides centralized high-performance file systems spanning supercomputing, data post-processing, data analysis, visualization, and HPC-based data transfer services. Additional services like high-performance data transfer protocols, including a new data-sharing service, enhance CISL’s ability to bring data from other sites to NCAR for post-processing, analysis, and visualization and to share data easily with external collaborators.

The data sharing service leverages the capabilities of Globus Plus to increase customization options for storage as well as data sharing. Globus, a project of the Computation Institute (a partnership of The University of Chicago and Argonne National Laboratory), is a software service that has been described as a “Dropbox for big data.” It is broadly used in the scientific community. “Plus” refers to a feature that allows researchers to share data with colleagues outside of their home institutions, greatly facilitating collaborative work.

In FY2015 the small proof-of-concept data-sharing service built upon Globus Plus was moved to the production GLADE service allowing data-sharing from both a dedicated data-sharing space and the larger GLADE project spaces. In addition to making data available to external colleagues, Globus Plus allows users of CISL's HPC environment to control the users or groups of users to which the data are accessible. With the sharing service, outside users need only a free Globus account, not a UCAR username/token, to access shared data. An additional joint project between CISL/DSS and the Globus team integrated Globus Plus features into the RDA data service. Part of this integration allows RDA users to access data-sharing with their RDA username and credential instead of a Globus account.

This work supports CISL’s computing imperative for hardware cyberinfrastructure by provisioning storage and networking systems customized to support efficient workflows for the atmospheric and related sciences. GLADE also advanced CISL’s computing imperative for facilities by demonstrating high-performance data services that were critical for the next-generation resources that now operate at NWSC and will continue to be critical as we move toward next-generation resources in FY2016.

GLADE equipment was purchased with NSF Special funds, and it is supported by NSF Core funds including CSL funding.