Support for community workshops, tutorials, and summer schools

CISL hosts community workshops, tutorials, and summer schools on strategic topics designed to advance science, develop collaborations, and inform strategies for the future. This past year, CISL hosted a broad range of these events across numerous topics. Examples described here are the Multi-core V Workshop hosted by the Technology Development Division (TDD) and five data science conferences hosted by the Institute for Mathematics Applied to the Geosciences (IMAGe).

Multi-core V Workshop

The purpose of the Fifth Multi-core Workshop was to provide a forum for open discussion to better understand the application of new high performance computing technologies for the next generation of weather, climate, and Earth System models. The new generation of high performance computer architectures has diverse heterogeneous architectures that present significant challenges to the community working on these models. The workshop was held September 16 and 17 at the NCAR Mesa Laboratory, and included 39 U.S. and international attendees.

Multi-core workshop participants
Multi-core V brought to NCAR experts in many-core processor programming applied to geophysical fluid flow problems. Participants came from multiple laboratories, agencies, companies, and nations. Seven students and postdocs also participated.

The workshop’s primary goals were to:

  1. Provide a forum for presenting experiences and lessons learned from the development of weather and climate models on these platforms.

  2. Create a community of developers who can work together to develop the software standards needed for these platforms.

  3. Exchange information about programming techniques, code parallelization and optimization, and I/O strategies on these platforms.

  4. Provide input to standards committees on what the community would like to see in programming models for the applications.

  5. Exchange ideas on the issues surrounding the scalability of these codes on future platforms.

At Multi-core V, there were 19 talks arranged in five sessions focusing on thematic areas such as: overviews and strategies, tools and techniques, and climate, weather, and ocean models. Multi-core V was hosted using NSF core funds.

IMAGe summer conferences focus on Big Data for research

In FY2015, CISL’s Institute for Mathematics Applied to Geosciences (IMAGe) offered a variety of conferences designed to help Earth science researchers cope with the ever-increasing challenges of “Big Data.” These conferences support the research communities’ need to extract scientific knowledge from the petabytes of data being produced by today’s instruments and computers.

In May, the IMAGe-STATMOS Summer School in Data Assimilation was part of a series designed to help train the next generation of researchers working in data-rich disciplines. It brought together graduate students, early-career scientists, and senior scientists in environmental statistics and related fields to explore contemporary topics in applied environmental data modeling. During their four days at the workshop, participants received an introduction to data assimilation methods and their applications, as well as hands-on training in the use of IMAGe’s Data Assimilation Research Testbed (DART).

In June, IMAGe presented a week-long Data Analytics Bootcamp for High School Students, an opportunity for 10 Boulder high school sophomores and juniors to learn about being a data scientist. Demand for data scientists continues to increase as the Big Data era produces data in varieties and volumes far exceeding anything scientists and engineers have ever had to manage before. The bootcamp’s curriculum was an engaging hands-on experience for the students as they performed exercises using authentic data to analyze and solve real-life problems.

Three bootcamp instructors
Each pair of students at the Data Analytics Bootcamp received guidance from one expert during the workshop exercises. The support staff shown in this photo includes, from left to right, Colette Smirniotis, Dorit Hammerling, Lee Richardson, and Nathan Lenssen. The 10-minute exercise being conducted here followed five minutes of instruction in a new concept. This format was designed to sustain student interest during the intensive training and ensure that each participant had immediate, supported practice applying their new skills.

IMAGe hosted three more conferences in July, August, and September to continue developing researchers’ skills in Environmental Data Analytics, Ensemble Data Assimilation, and Climate Data Informatics.

Data analytics is the discipline of interpreting data to discover useful information and patterns with the goal of answering specific questions, gaining scientific insight, or making more effective decisions. Data analytics uses methods and algorithms drawn from statistics and computer science to help researchers explore the ever-increasing volume of data that supports science, engineering, medicine, commerce, and many other aspects of society. For NCAR researchers, effective data analytics reveals more scientific information from both observations and numerical simulations, and it often produces graphics to communicate results visually. In July, IMAGe hosted the Second Annual Graduate Workshop on Environmental Data Analytics was part of an ongoing series designed to prepare the next generation of researchers and practitioners to work within and contribute to the data-rich era. Each workshop brings together researchers from graduate students to senior scientists in environmental statistics and related fields to explore contemporary topics in applied environmental data modeling. This second annual workshop offered hands-on computing and modeling tutorials, presentations from graduate student participants, and invited talks from early-career and established leaders in environmental data modeling. Tutorials and invited talks addressed useful ideas and tools that are directly applicable to student participants’ current and future research. Seven of the 29 participants came from EPSCOR states.

Data assimilation refers to methods that combine data from observations and the output of numerical models to provide improved estimates and better prediction of real systems. An ensemble assimilation uses a sample of states of the system where the variation among the ensemble members quantifies the uncertainty in the state. A familiar application of data assimilation in the geosciences is weather forecasting, where a large set of weather observations are combined with the output of a numerical weather model to make forecasts. At NCAR, data assimilation is also used to improve climate models and check physical models against observations. In August, the IMAGe workshop Frontiers in Ensemble Data Assimilation for Geoscience Applications focused on (1) ensemble data assimilation for atmosphere, ocean, land, and coupled Earth System models, and (2) hybrid ensemble variational assimilation techniques. Participants explored current techniques and applications of data assimilation in the geosciences. Indicating the international appeal of ensemble data assimilation, 13 of the 27 participants came from non-U.S. universities. Two participants came from EPSCOR states.

Data informatics is a discipline for examining large data sets to find patterns and structure that can help in understanding the relationship between different variables or to make predictions. Climate data informatics broadly refers to any research combining climate science with approaches from statistics, machine learning, and data mining. Conferences between researchers from all of these areas stimulate the discussion of new ideas, foster new collaborations, grow the climate informatics community, and accelerate discovery across disciplinary boundaries. In September, the Fifth International Workshop on Climate Informatics emphasized communication among all the various fields, with a strong emphasis on brainstorming during the breakout sessions and panel discussions. Most of the 86 participants came from U.S research universities, with 13 from international universities, 10 from corporations, and 10 from other research laboratories. This workshop series was co-founded by Claire Monteleoni (George Washington University) and Gavin Schmidt (NASA Goddard Institute for Space Studies) under a multi-year NSF grant, and a variety of other sponsors help fund the series. An extra full day was added on the Saturday after this workshop for NCAR’s first data science “hackathon,” where participants were given a challenge problem in climate informatics. Small teams were formed to implement machine-learning and data-mining algorithms using the python programming language. The event, referred to more formally as a Rapid Analysis and Model Prototyping (RAMP) had 28 participants, and encouraged them to test different analytics solutions for a problem, then deliver a prototype as an initial outcome. The event trained novice data scientists in hands-on analytics, introduced a complex, real-world scientific problem, then benchmarked different solutions. The focus was on collaboration and efficient exploration.

Data informatics workshop
The participants in the Fifth International Workshop on Climate Informatics interact during the “Knowledge discovery in climate science” presentation by Imme Ebert-Uphoff of Colorado State University. The workshop’s poster session and reception on the first night featured more than 40 posters. Also on the program were two panel discussions, “Deep Learning for Climate Science” and “Encoding climate knowledge into climate learning,” that were designed to generate new ideas across research disciplines.

These IMAGe workshops were supported by NSF Core funding, except as noted above.