Engage and support users of CISL’s HPC environment

CISL’s commitment to delivering robust, accessible, and innovative advanced computing services and resources spans several user communities, and CISL provides all of its users with responsive and knowledgeable support services. CISL’s success in supporting scientific goals and enabling scientific impact depends in equal measure on understanding users’ needs and on integrating CISL’s resources, capabilities, and services in response to those needs. With CISL as its discipline-specific computing center, NCAR is one of just a small number of institutions with the resources and support services necessary to conduct high-end atmospheric research, model development, and support for field campaigns.

In FY2019, more than 1,800 users at more than 300 universities and other institutions benefited from the use of CISL’s high-performance cyberinfrastructure and services. More than 680 new users joined the CISL computing community, which spans 17 areas of interest in the atmospheric and related sciences. The university community reported 648 publications and 115 dissertations and theses resulting from CISL HPC support in FY2019.

Cheyenne use by science area
Figure 1. FY2019 usage of Cheyenne by different science areas. Climate research, including paleoclimate and regional climate, consumed 59% of Cheyenne’s delivered core-hours, with weather, mesoscale meteorology, and atmospheric chemistry work consuming another 22%. The other science areas included computational science, Earth science, and environmental systems, as well as operational staff use.

User communities and their activities

CISL works to provide equitable and efficient access to several distinct communities of researchers in the atmospheric and related sciences, including the university community, Climate Simulation Laboratory users, NCAR researchers, and University of Wyoming researchers through the Wyoming-NCAR Alliance.

University community

Approximately 28% of the Cheyenne system was available to U.S.-based university researchers with NSF awards in the atmospheric or related sciences. University requests are reviewed twice per year by the CISL HPC Allocation Panel (CHAP). In October 2018 and April 2019 combined, the CHAP reviewed 63 requests for 502 million core-hours on the Cheyenne system and awarded 235 million core-hours. In addition, university researchers, graduate students, and postdoctoral researchers requested and received 275 small and educational allocations. In geographic scope, CISL’s university users represent hundreds of universities and collaborating institutions, primarily in the United States as defined by our HPC mission. In FY2019, active projects used HPC or analysis resources in support of nearly 350 unique NSF awards, and 954 university projects were open during the year on CISL resources.

NCAR researchers

Another 28% of Cheyenne was allocated in FY2019 to NCAR researchers to support the computational needs of the NCAR laboratories, including NCAR Strategic Capability projects. Requests for large-scale Cheyenne projects were reviewed in October 2018 and April 2019 by a panel of NCAR computational scientists and approved by the NCAR Executive Committee.

Climate Simulation Lab

About 27% of Cheyenne was made available for Climate Simulation Lab (CSL) activities at NCAR in FY2019 after review by the CHAP. In addition to supporting the CESM community allocation, the CSL ensures that university researchers funded by NSF awards have the opportunity to pursue climate-related science questions requiring large-scale simulations.

Wyoming-NCAR Alliance

The Wyoming-NCAR Alliance (WNA) – which targets geosciences collaborations among the University of Wyoming, NCAR, and institutions in other EPSCoR states – convened the Wyoming Resource Allocation Panel in January and July 2019. The WNA awarded 129 million core-hours to eight large projects in FY2019 and also made 13 small and educational allocations; 34 different WNA projects used more than 93 million HPC core-hours.

Usage levels per facility
Figure 2. Left: The usage levels per facility targeted by the allocations process. Right: The actual percentage of usage delivered by Cheyenne in FY2019.

Support services for growing community of users

CISL’s strategic commitment to its user communities includes 24x7 frontline user support, extensive online documentation, and consulting services for providing in-depth expertise. CISL’s User Services Section (USS) streamlines and coordinates user-oriented procedures and support activities for the CISL Help Desk, HPC Consulting Services Group (CSG), documentation, and accounts and allocations.

CISL provides training for researchers in the atmospheric and related sciences to help them improve their understanding and use of NCAR’s petascale computing and data resources. USS personnel provided or facilitated 13 formal learning opportunities that were attended by 205 people in FY2019. In addition, CISL Help Desk personnel provided HPC user credentials to 327 students attending 17 university courses and training events hosted by other NCAR labs. CISL also leverages training and professional development opportunities that are provided by national and regional HPC consortia such as the Extreme Science and Engineering Discovery Environment and the Rocky Mountain Advanced Computing Consortium. 

CISL tracks user support activity for this growing community using an ExtraView trouble ticket system. In FY2019, the system recorded 12,514 tickets to the CISL Help Desk (excluding NETS and tickets automatically generated by the HPC monitoring system), a 5.8% increase from the FY2018 total, due in large part to the conversion of many users to the Duo two-factor authentication system. Of the total tickets submitted, the Help Desk team closed 4,965 in an average of 2.01 days (median, 0.23 days), or an average of 413 per month. In the same period, Consulting Services Group staff resolved 2,541 more-complex requests with an average resolution time of 33.1 days (median, 7.8 days). An additional 203 user support tickets that were fielded related primarily to managing allocations and accounting, with an average resolution time of 4.1 days (median, 0.90 days).

Optimizing model performance for current and future systems

More than 80% of CISL’s HPC system use is related to running NCAR-developed climate and weather applications. Because of this well-defined workload, CISL dedicates staff time and effort to optimizing the codes and the system environments to ensure that the most heavily used models and applications run as efficiently as possible on current and future systems. These efforts target code optimizations that will realize benefits on current supercomputer hardware and translate to performance benefits on future processor architectures. 

In FY2019, CISL continued its efforts to increase the performance and efficiency of NCAR’s CESM, WRF, MURaM, and MPAS community codes on Cheyenne. Most projects involve embedding members of the Consulting Services Group in model development teams across NCAR. 

  • CSG continued to coordinate the multi-institution effort to port MURaM to GPGPU architectures leveraging the OpenACC performance-portable programming standard. The goal of the project is to improve MURaM’s scalability to prepare for the orders of magnitude increase in the amount of data that will be generated by the Daniel K. Inouye Solar Telescope, which is scheduled to be deployed in 2020. The team consists of members from CISL, NCAR’s High Altitude Observatory Lab, the University of Delaware, and the Max Planck Institute for Solar System Research. This year the team focused on implementing the radiative transport routine in OpenACC, accelerating the magnetohydrodynamics function and optimizing CPU-GPU data transfers. A major milestone was achieved with successful execution across multiple GPU nodes.

  • CISL and the Climate and Global Dynamics Laboratory continued their joint effort to port CESM2 to both the ARM and IBM Power architectures that offer promising, cost-effective alternatives to the Intel Xeon architecture that has dominated HPC systems for most of the past decade. The team leveraged accounts secured on remote systems for the porting efforts. Both vendors’ compiler development teams were engaged to address performance and optimization issues. Findings from this effort informed the benchmarks defined for the NWSC-3 procurement process.

  • CSG achieved significant runtime speedups (30%-50%) for both the WRF-Hydro model and at least one user's custom C++ code by tuning the network traffic model used by Cheyenne’s default MPI library, MPT. 

  • CISL also organized and hosted several vendor-led training events. ARM provided an update and roadmap of their frequently used debugging and profiling tools, DDT and MAP. A full-day Intel tutorial focused on many of their tools that support advanced application performance and acceleration, including their Math Kernel Library, AI Performance tools, advanced data compression tools, Data Analytics Acceleration Library, and Vtune Amplifier performance profiler. CISL also conducted five Modern Fortran workshop sessions to train the user community, particularly newer NCAR staff, on the foundational programming language of CESM and WRF.

Providing users with access to and support for HPC resources is a crucial part of NCAR’s imperative to provide hardware cyberinfrastructure customized for the atmospheric and related sciences. This ongoing service for users is supported by NSF Core funds including CSL funding. Funding from the University of Wyoming supports the Wyoming Resource Allocation Panel.