Data-centric research

Snow water equivalent plots
This figure illustrates the need for statistics methods to combine geophysical data products of fields. Shown are four different versions of snow water equivalent (SWE) for part of the U.S. Rocky Mountain region and surrounding states (state outlines in gray). For analysis, the SWE values have been transformed with a power transformation and the mean fields for 18 years are shown in these image plots. Despite some common spatial patterns, the data products differ in resolution, smoothness, and their extremes. Besides the need to find a common field to summarize these four products, this example also shows the need to express the uncertainty suggested by the variation among the fields. The difficulty in a statistical solution is compounded by the large number of grid points representing North America and also because SWE follows a skewed, nonGaussian distribution.

Grand challenges of modeling the Earth System require the interpretation and transformation of geophysical data in many forms. These activities range from mining the Big Data problems associated with remotely sensed observations and the output from large numerical experiments to interpreting the wide range of small but vital historical data sets that document past climate and important geophysical processes. Besides observational data, geophysical models depend on data for initial fields and forcing variables, and typically models generate substantial and complex data objects for interpretation. Part of the challenge of compressing and transforming model data is to preserve scientific value and also make it easy for modelers to use parallel analysis tools. In addition, the discipline of data assimilation combines models and observations to produce predictions, reanalysis products from past weather, and also to diagnose model shortcomings. Thus, to meet the varied needs of our research communities, CISL research takes an interdisciplinary approach where collaboration with scientific teams within and outside NCAR helps to motivate new software tools and analysis methods.

CISL’s data-centric view with a focus on high performance computing results in research that integrates different aspects of computational and mathematical science. For example, our research on large data assimilation problems combines algorithms for ensemble representation (e.g., ensemble Kalman filter) with statistical ideas for robustness and stability of the methods. Making regional climate experiments useful for impacts analysis has resulted in combining ideas from fitting statistical distributions with the specific needs for objective basis corrections of model output. The need for spatial statistics for large data sets has spurred approximations to standard Bayesian statistics that are suited to parallel computing. Finally, the research on data compression has involved blending “off the shelf” compression algorithms with the particular requirements and workflows that are encountered in climate model research.

A few highlights that illustrate the breadth of this research are:

  • A demonstration of the added value of using carbon monoxide (CO) retrievals from satellite-based instruments (MOPITT) to infer the distribution of CO in the Rocky Mountain region. This is an example of the success of NCAR’s data assimilation system (DART) in support of the FRAPPE field campaign.

  • Using the NARCCAP regional simulations, a process analysis has drawn a link between more-credible regional models and smaller decreases in the future (2041-2070) for the North American Monsoon. This is an important finding because it increases the value of using the NARCCAP experiments for assessing impacts of climate change.

  • The use of wavelet encoders to compress double-precision climate model output gives a striking improvement over naive single-precision truncation. These results demonstrate the value of more-tailored and algorithmic approaches to compressing (i.e., reducing the size) of model output without compromising the information content.

  • A statistical approach is able to blend multiple snow water equivalent data products into a single coherent estimate along with measures of uncertainty. This methodology is an example of approximate Bayesian methods applied to a practical problem in assessing snow accumulation for climate modeling and analysis.This work was also noteworthy for resulting from a collaboration between scientists in IMAGe and a SIPARCS student (Colette Smirnoitis). It also leveraged statistical software created in IMAGe for large geophysical data analysis.

Funding for these activities is indicated at the end of each of the more detailed subsections.