We have developed a software tool, the SOM-Assisted Hazard Area Risk Analysis (SAHARA), to reduce large climate datasets to more manageable sizes - yet statistically similar - which are then used to produce ensembles of potential hazard outcomes.
The self-organizing map (SOM) is a machine learning / data clustering algorithm that is well-suited for data that have strong topological properties. By employing the SOM algorithm to analyze topological patterns of climatological fields over a regional domain for a 30-year span, we can find a close statistical equivalent with fewer, non-contiguous input days. When using SOMs to cluster monthly climate data in this way, we find that by sampling only 150 days, it reduces computational time by greater than a factor of 6 compared to using the entire climate dataset.
The SAHARA software can scale from a laptop to workstations to many-core, many-node clusters by using a modern microservice architecture to distribute the Climate Database (CSFR currently), the SOM Engine, atmospheric model ensembles (such as the SCIPUFF Transport and Dispersion model) and pre- and post-processing across available computing resources, either locally or remotely.