Provide the HPFL test and exploration laboratory

Since 2014, CISL has operated its High Performance Futures Laboratory (HPFL), a collection of small, experimental systems located both at the Mesa Lab facility and at the NWSC. The HPFL is a flexible, multi-system testbed where NCAR staff and collaborators can evaluate new and innovative hardware technologies, software technologies, and concepts such as energy-aware scheduling, cloud bursting, containers, etc. Much of the research and testing of these emerging technologies is done looking forward to the day when these may be part of future production systems. HPFL is focussed on computational technologies as well as innovative and emerging storage, file systems, and data archive resources.

The computational and data demands of science have evolved with new science drivers and modeling scales. Traditionally, NCAR and CISL have addressed user demands by deploying more powerful next-generation computing and storage architectures. However, recent technical innovations coupled with unprecedented technical challenges, emergence of big data, use of ML/DL to accelerate scientific research, elevated power and cooling costs, and other challenges have necessitated the need for CISL and HSS to partner with technology developers and vendors to facilitate the development of node and system architecture complemented by a storage and software environment that is better suited to atmospheric and related science workflows, data analysis, and visualization.

The HPFL is designed to provide strategic opportunities for NCAR staff and collaborators to gain valuable experience with emerging hardware and software technologies. CISL also uses it to evaluate and mitigate risks arising from new technologies. The HPFL provides a ready-to-use environment where new technologies can be deployed and tested by system administrators, consulting staff, and computational scientists. Current research is studying areas such as heterogeneous architectures, GPGPUs, coprocessors, resource managers, job schedulers, Message Passing Interface (MPI) software, benchmarks, performance tuning, file systems, and a variety of computation- and I/O-intensive applications.

New HPC and storage technologies such as high-performance stacked and non-volatile memory, and hybrid solid-state and non-volatile disk technologies are rapidly transforming the HPC and storage landscape. Such technologies are poised to transform system architectures by introducing additional layers of memory, a hierarchy of I/O devices, as well as tighter integration of those technologies with computing elements. CISL is using the HPFL and vendor partnerships -- such as the SGI-NCAR Joint Center of Excellence and the Intel Parallel Computing Center -- to learn how to apply these technologies effectively in atmospheric and geoscience applications and to meet the data requirements of future systems and applications.

CISL continued operating and enhancing the HPFL during FY2017. By year end, the HPFL contained many new additional test platforms and concepts. Additionally, CISL added an I/O component to the HPFL that is focused on storage futures, I/O optimization, and better data-hosting services. This component included SSD technology, Data Direct Networks (DDN) storage systems, and a test environment for new HPSS releases with HPSS on Linux and rolling upgrades to the NCAR HPSS archive.

HPFL software evaluation activities in FY2017 included testing operating systems and their management (Salt, CentOS), resource managers (SLURM, PBS Professional), MPI implementations (Intel MPI, OpenMPI, MPICH-3, and MVAPICH), containers (Docker), the Spectrum Scale (formerly GPFS) and Lustre parallel file systems, DDN’s Infinite Memory Engine (IME), and an HPSS test environment. We also started exploring strategies that involve in-memory technology or on-flash residency of data to speed up data analysis and post processing activities. These are being developed to rapidly ingest, analyze, and filter data on the fly before the data is ready to be shared or used for further analysis and collaboration. We are also preparing for the possibility and potential use of artificial intelligence involving machine learning (ML) and deep learning (DL).

During 2017, progress was made in building a complete I/O test environment. Configuration management systems are in place, and quick-build procedures for node deployment have been developed. Storage systems to support parallel file system research (Lustre, Spectrum Scale), distributed file system research (Ceph, BeeGFS, OrangeFS), and cloud infrastructures (OpenStack) have been configured. In addition, a small compute cluster is being configured to drive I/O tests. Additional equipment for I/O network research was also purchased and installed. A Mellanox InfiniBand router was added along with a Mellanox 100-Gb Ethernet switch. These network components will allow for further test and evaluation of high-bandwidth, low-latency networks specifically designed for I/O workloads. The InfinBand router is particularly important for evolving data network designs. New tape drive technology -- LTO, which represents the next generation of technology for archival systems -- has been acquired for investigation in the HPFL.

CISL’s HPFL is made possible by NSF Core funds and through partnerships with and equipment donations from leading HPC and storage vendors.