Spatial forecast verification methods were rapidly introduced to address inconsistencies found between a forecaster’s subjective assessment of a forecast versus traditional verification’s assessment of the same forecast, which tended to favor coarser resolution models over the newer high-resolution counterparts. Subsequently, much effort has been placed in evaluating the utility of these new methods; trying to determine their reliability in terms of both repeatability (if a researcher applies the method several times to the same cases, would they always have the same conclusions?), reproducibility (if different users apply the same verification method, would they have the same conclusions as each other?), as well as to determine if the methods yield sensible information about forecast performance (i.e., do they measure physically meaningful errors, or do they give erroneous information about performance?). Such work continues with the development of a new set of cases along with their evaluation by several distance-based spatial verification measures; the findings of which have been submitted in the paper Gilleland et al (2019).

An example of one of the cases proposed is shown in Figure 1. The centroid distance is a mathematical metric (meaning that it satisfies three generally desirable properties of a measure) that informs about the *centroids* of two fields (or individual features within a field). The observation, A, is an area where a variable exceeds a certain threshold and B and C represent two different “forecasts” of this area. The centroid distance favors B, giving it a perfect score, because it has identically the same centroid as A. Therefore, if the centroid is the most important feature that a user is interested in, then centroid distance is a valuable measure. On the other hand, if it is more important to get the overall area correct, even if it is displaced slightly in space, then centroid distance may not be ideal; at least in the sense of not being reliable as defined above.

Over fifty new (binary) geometric cases were proposed for testing spatial verification methods, particularly those aimed primarily at location errors, in order to test the reliability of verification measures, and to help determine what properties each measure has, as well as how they might fail. Several distance-based measures were also applied to these cases, and the results are in the paper Gilleland et al. (2019). In particular, a common situation in weather forecasting is that nothing is forecast (e.g., no rain anywhere in the domain). If both fields are empty (zero-valued everywhere), then it should be a perfect forecast. If one field has just a few non-zero values, then perhaps it is still an excellent forecast. It turns out that many methods are either undefined for this situation, or when they are defined, they are highly sensitive to the addition of one or more non-zero-valued points; leading to spurious results. Also, the position in space of these non-zero values can greatly affect several of the measures.

Many of the cases involve simple circles, some of which had previously been used in Gilleland (2017). These cases each test how methods inform about errors for specific challenging situations and are summarized in Figure 2. Other cases include ovals that mainly have one or more of three types of errors (size bias, location errors and orientation errors), as well as some cases involving random placement of event areas within different envelopes, and some additional sensitivity cases with noise added to other cases.

Additionally, the MesoVICT project had its final workshop in Vienna, Austria to conclude what had been learned about situations specific to complex terrain.

- Investigate bootstrap properties under realistic situations for forecast verification measures
- Submit a paper on bootstrapping for forecast verification

Gilleland, E., 2017. A new characterization in the spatial verification framework for false alarms, misses, and overall patterns. *Weather Forecast.*, **32** (1), 187 - 198, doi: 10.1175/WAF-D-16-0134.1.

Gilleland, E., G. Skok, B. G. Brown, B. Casati, M. Dorninger, M. P. Mittermaier, N. Roberts, and L. J. Wilson, 2019. A novel set of verification test fields with application to distance measures. Submitted to Monthly Weather Review on 3 August 2019.