Weather Prediction Machine Learning Optimization

BACKGROUND

Figure 1. DICast system diagram.
Figure 1. DICast system diagram.

RAL is a leader in the development of intelligent weather prediction systems that blend data from numerical weather prediction models, statistical datasets, real time observations, and human intelligence to optimize forecasts at user– defined locations. The Dynamic Integrated Forecast System (DICast®) and the GRidded Atmospheric Forecast System (GRAFS) are examples of such technology (Figures 1 and 2).

Figure 2. GRAFS system design.
Figure 2. GRAFS system design.

DICast® is currently being used by three of the nation's largest commercial weather service companies. Applications of this technology continue to expand as there is a growing desire in industry to have fine–tuned forecasts for specific user–defined locations. This trend is clear in the energy, transportation, agriculture, and location–based service industries. RAL's expertise in meteorology, engineering, and applied mathematics and statistics is being utilized to address society's growing need for accurate weather information.

FY 2018 ACCOMPLISHMENTS

During this year significant research has been performed with machine learning techniques in an attempt to improve DICast® visibility classification forecasts for aviation. Preliminary work tested if machine learning models could be used to predict FAA flight category rules. Various GFS/HRRR regressors along with other spatial/temporal data were used to classify visibility into the following categories:

Class 0

Class 1

Class 2

Class 3

0 <= visibility (miles) < 1

1 <= visibility (miles) < 3

3<= visibility (miles) < 5

visibility >= 5 miles (Clear)

Around 90% of the labeled training data is contained in a clear visibility category; therefore, creating a classifier that can differentiate between the restricted visibility classes is challenging. A gaussian anomaly detection algorithm was used to differentiate between the clear visibility case and restricted visibility cases.  Additionally, a number of techniques including undersampling and synthetic (interpolated) oversampling were tested to balance the flight categories.  The following ROC curves (Figure 3) show results from a boosted tree trained on HRRR data from 20160601-20170105 with test data from 20170115-20170215.  This results (area under ROC > 0.5, F1, and confusion matrix) show that there is some skill in the classifier but more research is required

Figure 3. Area under ROC results for machine learning predictions of class 0, 1 and 2 visibility.
Figure 3. Area under ROC results for machine learning predictions of class 0, 1 and 2 visibility.

DICast’s impact on renewable energy forecasting has led to its use in other renewable energy projects. In particular, a project with the Kuwait Institute of Scientific Research (KISR) started using DICast® as the core forecast engine for a combined wind and solar forecasting system. This system will combine output from global numerical weather prediction models and a high-resolution version of WRF to produce custom forecasts for an extreme desert climate environment. The KISR project is a multi-stage machine learning methodology as StatCast, a machine learning based approached for wind and solar power predictions based on surface observations, will be utilized in the KISR project for short-term predictions out to six hours and blended with the DICast® forecasts.

Figure 4. Comparison of super-turbine power conversion to observed power (left) and comparison of the random forest predicted power versus the observed power (right)
Figure 4. Comparison of super-turbine power conversion to observed power (left) and comparison of the random forest predicted power versus the observed power (right)

In addition to predicting renewable energy power output based on meteorological conditions, machine learning has been used to convert from wind speed to wind power.  In wind power conversion there are typically two methods: the turbine-level approach where each nacelle wind speed is converted to power and the super-turbine approach where the mean wind speed at the wind power plant is converted to power and scaled to the total capacity.  RAL performed research that showed the super-turbine approach to power conversion is impacted by a mathematical paradox called Jensen’s Inequality where applying the mean to a series of values before applying a non-linear function is not the same as taking the mean of the series of values after applying a non-linear function.  The research showed that not only are there systematic biases in the spatial distribution of turbines at a wind power plant, but there are also temporal systematic biases based on the frequency of measurements and variability of the wind speed.  However, it was found that a random forest machine learning method was able to learn these non-linear relationships when given predictors for the spatial and temporal variability of the wind speed at the wind power plant.  The error was reduced by nearly 38% using the random forest and the R2 value increased compared to the typical super-turbine approach and polynomial power curve, as shown in Figure 4.  This figure illustrates the differences between the super-turbine power conversion (left) and the machine learning power conversion (right) with a greater density of instances falling along the 1:1 line for the machine learning power conversion than the super-turbine approach.

RAL has advanced the application of machine learning to support wildfire prediction.  Atmospheric conditions, fuel type, and fuel moisture content (FMC) are critical factors controlling the rate of spread and heat release from wildland fires.  Commonly used wildland fire spread models have displaced significant sensitivity to FMC; therefore, having accurate FMC estimates to use as initial conditions is important.  The National Fuel Moisture Database provides sporadically updated information about FMC created by interpolating sparse manual samplings of live FMC and relatively sparse surface observations of dead FMC (by Remote Automated Weather Stations.  At present gridded FMC data set that can be assimilated in real-time in an operational system does not exist.  RAL is currently building a real-time FMC database to use in WRF-FIRE coupled atmosphere wildland fire prediction model, which is a component of the Colorado Fire Prediction System.   The goal is to achieve more accurate accounting for live and dead FMC that will result in more realistic, dynamic representation of fuel heterogeneity and in improved accuracy of wildland fire spread prediction.  

Figure 5. Comparison of interpolated fuel moisture content measurements across Colorado (left) versus the machine learning predictions on the same grid (right), illustrating the finer resolution of the machine learning algorithm.
Figure 5. Comparison of interpolated fuel moisture content measurements across Colorado (left) versus the machine learning predictions on the same grid (right), illustrating the finer resolution of the machine learning algorithm.

During this year, the development of Random Forest and Gradient Boosted Regression Trees machine learning technique have been completed for a preliminary test over Colorado. This technique has improved the ability of estimating the FMC as shown in Figure 5 where the interpolated values have courser resolution while the predictions have finer resolution.  In addition to having finer resolution “observations” of FMC, the error of the FMC predictions is approximately half the standard deviation of the observations, which highlights the predictive skill in the machine learning for estimating FMC.

These RAL forecast systems also continuing to push the envelope of advanced weather forecasting in the transportation sector. The Maintenance Decision Support System (MDSS) was adapted from its original focus on roadways to be used as a Runway Decision Support System for Denver International Airport (DIA). The system generates tuned weather forecasts and treatment recommendations for the runways at DIA. In addition, DICast® and a weather-tuned version of GRAFS form the backend weather engine used in both the FHWA and Colorado Pikalert Hazard Assessment forecast systems.

FY 2019 PLANS

Areas of development for the next fiscal year include:

  • Extend machine learning techniques to other variables produced by DICast®
  • Finalize the machine learning technology for predicting gridded fuel moisture content for CONUS for improve wildland fire prediction
  • Advance the application of machine learning in renewable energy prediction across timescales and climates, including applying regime-dependent machine learning techniques to wind power prediction
  • Make improvements related to road temperature and precipitation forecasts in the MDSS
  • Test the application of the Analog Ensemble on DICast® forecasts to produce probabilistic wind and solar power predictions