Engage and support users of CISL’s HPC environment

CISL’s commitment to delivering robust, accessible, and innovative advanced computing services and resources spans several user communities, and CISL provides all of its users with responsive and knowledgeable support services. CISL’s success in supporting scientific goals and enabling scientific impact depends in equal measure on understanding users’ needs and on integrating CISL’s resources, capabilities, and services in response to those needs. With CISL as its discipline-specific computing center, NCAR is one of only a small number of institutions with the resources and support services necessary to conduct high-end climate research, model development, and support for field campaigns.

In FY2018, more than 1,700 users at nearly 300 universities and other institutions benefited from the use of CISL’s high-performance cyberinfrastructure and services. More than 560 new users joined the CISL computing community, which spans 17 areas of interest in the atmospheric and related sciences. The university community reported 558 publications and 69 dissertations and theses resulting from CISL HPC support in FY2018.

Graph of science areas
Figure 1. FY2018 usage of Cheyenne by different science areas. Climate research, including paleoclimate and regional climate, consumed 55% of Cheyenne’s delivered core-hours, with weather, mesoscale meteorology, and atmospheric chemistry work consuming another 22%. The other science areas included computational science, Earth science, and environmental systems, as well as operational staff use.

User communities and their activities

CISL works to provide equitable and efficient access to several distinct communities of researchers in the atmospheric and related sciences, including the university community, Climate Simulation Laboratory users, NCAR researchers, and University of Wyoming researchers through the Wyoming-NCAR Alliance.

University community

Approximately 28% of the Cheyenne system was available to U.S.-based university researchers with NSF awards in the atmospheric or related sciences. University requests are reviewed twice per year by the CISL HPC Allocation Panel (CHAP). In October 2017 and April 2018 combined, the CHAP reviewed 80 requests for 617 million core-hours on the Cheyenne system and awarded 365 million core-hours. In addition, university researchers, graduate students, and postdoctoral researchers requested and received 252 small and educational allocation requests. In geographic scope, CISL’s university users represent hundreds of universities and collaborating institutions, primarily in the United States as defined by our HPC mission. In FY2018, active projects used HPC or analysis resources in support of more than 360 unique NSF awards, and 923 university projects were open during the year on CISL resources.

NCAR researchers

Another 28% of Cheyenne was allocated in FY2018 to NCAR researchers to support the computational needs of the NCAR laboratories, including NCAR Strategic Capability (NSC) projects. Requests for large-scale Cheyenne projects were reviewed in October 2017 and April 2018 by a panel of NCAR computational scientists and approved by the NCAR Executive Committee.

Climate Simulation Lab

About 27% of Cheyenne was made available for Climate Simulation Lab (CSL) activities at NCAR in FY2018 after review by the CHAP. In addition to supporting the CESM community allocation, the CSL ensures that university researchers funded by NSF awards have the opportunity to pursue climate-related science questions requiring large-scale simulations.

Wyoming-NCAR Alliance

The Wyoming-NCAR Alliance (WNA) – which targets geosciences collaborations among the University of Wyoming, NCAR, and institutions in other EPSCoR states – convened the Wyoming Resource Allocation Panel in January and June 2018. In FY2018, the WNA awarded 83.5 million core-hours (of 128 million requested) to eight large projects and also made 13 small and educational allocations; 34 different WNA projects used more than 86 million HPC core-hours.

Graph of allocation targets and usage
Figure 2. Left: The usage levels per facility targeted by the allocations process. Right: The actual percentage of usage delivered by Cheyenne in FY2018. University and WNA usage was below targets due to Yellowstone’s continued availability through the first quarter of FY2018, resulting in delayed uptake of Cheyenne.

Support services for growing community of users

CISL’s strategic commitment to its user communities includes 24x7 frontline user support, extensive online documentation, and consulting services for providing in-depth expertise. CISL’s User Services Section (USS) streamlines and coordinates user-oriented procedures and support activities for the CISL Help Desk, HPC Consulting Services Group (CSG), documentation, and accounts and allocations.

CISL provides training for researchers in the atmospheric and related sciences to help them improve their understanding and use of NCAR’s petascale computing and data resources. CISL personnel provided or facilitated more than 20 formal learning opportunities that were attended by approximately 620 people in FY2018. CISL also leverages training and professional development opportunities that are provided by national and regional HPC consortia such as the Extreme Science and Engineering Discovery Environment and the Rocky Mountain Advanced Computing Consortium.

CISL tracks user support activity for this growing community using an ExtraView trouble ticket system. In FY2018, the system recorded 11,832 tickets to the CISL Help Desk (excluding NETS and tickets automatically generated by the HPC monitoring system), a modest 3.7% increase from the FY2017 total despite the decommissioning of a production HPC resource. Of the total tickets submitted, the Help Desk team closed 2,970 in an average of 2.11 days (median, 0.56 days), or an average of 277 per month. In the same period, Consulting Services Group staff resolved 3,015 more-complex requests with an average resolution time of 22.7 days (median, 7.0 days). An additional 292 user support tickets that were fielded related primarily to managing allocations and accounting, with an average resolution time of 6.46 days (median, 0.96 days).

Optimizing model performance for current and future systems

More than 80% of CISL’s HPC system use is related to running NCAR-developed climate and weather applications. Because of this well-defined workload, CISL dedicates staff time and effort through its Strategic Parallel and Optimization Computing (SPOC) initiative to optimizing the codes and the system environments to ensure that the most heavily used models and applications run as efficiently as possible on current and future systems. SPOC efforts target code optimizations that will realize benefits on current supercomputer hardware and translate to performance benefits on future processor architectures.

In FY2018, CISL continued its efforts to increase the performance and efficiency of NCAR’s CESM, WRF, and MPAS community codes on Cheyenne. In addition, a new project was initiated to port MURaM to GPGPU platforms. To maximize the impact of SPOC efforts, most projects involve embedding members of the Consulting Services Group in model development teams across NCAR.

  • CSG led the multi-institution effort to port MURaM to GPGPU architectures leveraging the OpenACC performance-portable programming standard. The goal of the project is to improve MURaM’s scalability to prepare for the orders of magnitude increase in the amount of data that will be generated by the Daniel K. Inouye Solar Telescope (DKIST), which is scheduled to be deployed in 2020. The team consists of members from CISL, NCAR’s High Altitude Observatory, the University of Delaware, and the Max Planck Institute for Solar System Research. To date the team has completed more than 95% of the porting effort and achieved modest performance improvements on single GPUs over the conventional CPU implementation. In the coming year the project will focus on maximizing performance on multi-node, multi-GPU systems.

  • CISL and the Climate and Global Dynamics Laboratory initiated a joint effort to port CESM2 to both the ARM and IBM Power architectures. Both chip sets offer promising, cost-effective alternatives to the Intel Xeon architecture that has dominated HPC systems for most of the past decade. NCAR secured accounts on remote systems for the porting efforts, and the respective vendors’ compiler development teams have been engaged to address issues. This year the project team will work closely with the NWSC-3 benchmarking team to guide that component of the procurement process.

  • As part of the ongoing effort to improve I/O performance in NCAR’s flagship models, particularly CESM, CISL released a more performant version of the Parallel I/O library. The new version, which will be fully integrated into CESM2 in FY2019, exploits IBM's latest release of Spectrum Scale (formerly GPFS), which was recently installed on Cheyenne. As much as a 10-fold improvement in I/O efficiency is expected by giving users greater control and flexibility over the number of system-level I/O operations required to read and write data.

  • Training has also been identified as a key contribution from the SPOC initiative toward building the relevant skills in the NCAR developer community. To that end, CISL organized and hosted vendor-led training events, including one by ARM that introduced the company’s newest debugging and profiling tools and one by Intel on its compilers and analysis tools.

Providing users with access to and support for HPC resources, including the SPOC initiative, is a crucial part of NCAR’s imperative to provide hardware cyberinfrastructure customized for the atmospheric and related sciences. This ongoing service for users is supported by NSF Core funds including CSL funding. Funding from the University of Wyoming supports the Wyoming Resource Allocation Panel.