370 likes | 382 Views
New developments and issues in forecast verification. Barbara Brown bgb@ucar.edu Co-authors and contributors : Randy Bullock, John Halley Gotway, Chris Davis, David Ahijevych, Eric Gilleland, Lacey Holland NCAR Boulder, Colorado October 2007. Issues. Uncertainty in verification statistics
E N D
New developments and issues in forecast verification Barbara Brown bgb@ucar.edu Co-authors and contributors: Randy Bullock, John Halley Gotway, Chris Davis, David Ahijevych, Eric Gilleland, Lacey Holland NCAR Boulder, Colorado October 2007
Issues • Uncertainty in verification statistics • Diagnostic and user relevant verification • Verification of high-resolution forecasts • Spatial forecast verification • Incorporation of observational uncertainty • Verification of probabilistic and ensemble forecasts • Verification of extremes • Properties of verification measures • Propriety, Equitability
Issues and new developments • Uncertainty in verification statistics • Diagnostic and user relevant verification • Verification of high-resolution forecasts • Spatial forecast verification • Incorporation of observational uncertainty • Verification of probabilistic and ensemble forecasts • Verification of extremes • Properties of verification measures • Propriety, Equitability
Uncertainty in verification measures Model precipitation example: Equitable Threat Score (ETS) Confidence intervals take into account various sources of error, including sampling and observational Computation of confidence intervals for verification stats is not always straight-forward
F O User-relevant verification: Good forecast or Bad forecast?
F O User-relevant verification: Good forecast or Bad forecast? If I’m a water manager for this watershed, it’s a pretty bad forecast…
F O A B Flight Route User-relevant verification: Good forecast or Bad forecast? O If I’m an aviation traffic strategic planner… It might be a pretty good forecast Different users have different ideas about what makes a good forecast
Diagnostic and user relevant forecast evaluation approaches • Provide the link between weather forecasting and forecast value • Identify and evaluate attributes of the forecasts that are meaningful for particular users • Users could be managers, forecast developers, forecasters, decision makers • Answer questions about forecast performance in the context of users’ decisions • Example questions: How do model changes impact user-relevant variables? What is the typical location error of a thunderstorm? Size of a temperature error? Timing error? Lead time?
Diagnostic and user relevant forecast evaluation approaches (cont.) • Provide more detailed information about forecast quality • What went wrong? What went right? • How can the forecast be improved? • How do 2 forecasts differ from each other, and in what ways is one better than the other?
Which rain forecast is better? Mesoscale model (5 km) 21 Mar 2004 Global model (100 km) 21 Mar 2004 Observed 24h rain Sydney Sydney RMS=13.0 RMS=4.6 High vs. low resolution “Smooth” forecasts generally “Win” according to traditional verification approaches. From E. Ebert
Traditional “Measures”-based approaches Consider forecasts and observations of some dichotomous field on a grid: Some problems with this approach: (1) Non-diagnostic – doesn’t tell us what was wrong with the forecast – or what was right (2) Utra-sensitive to small errors in simulation of localized phenomena CSI = 0 for first 4; CSI > 0 for the 5th
Spatial forecasts Spatial verification techniques aim to: • account for uncertainties in timing and location • account for field spatial structure • provide information on error in physical terms • provide information that is • diagnostic • meaningful to forecast users Weather variables defined over spatial domains have coherent structureand features
Recent research on spatial verification methods • Neighborhood verification methods • give credit to "close" forecasts • Scale decomposition methods • measure scale-dependent error • Object- and feature-based methods • evaluate attributes of identifiable features • Field verification approaches • measure distortion and displacement (phase error) for whole field
Neighborhood verification • Also called “fuzzy” verification • Upscaling • put observations and/or forecast on coarser grid • calculate traditional metrics
Neighborhood verification Treatment of forecast data within a window: • Mean value (upscaling) • Occurrence of event in window • Frequency of event in window probability • Distribution of values within window • Fractions skill score (Roberts 2005; Roberts and Lean 2007) observed forecast Ebert (2007; Met Applications) provides a review and synthesis of these approaches
Scale decomposition • Wavelet component analysis • Briggs and Levine, 1997 • Casati et al., 2004 • Removes noise • Examine how different scales contribute to traditional scores • Does forecast power spectra match the observed power spectra?
Intense storm displaced Skill scale (km) 1 0 -1 -2 -3 -4 640 320 160 80 40 20 10 5 0 1/16 ¼ ½ 1 2 4 8 16 32 threshold = 1mm/h threshold (mm/h) Scale decomposition • Casati et al. (2004) intensity-scale approach • Wavelets applied to binary image • Traditional score as a function of intensity threshold and scale
Feature-based verification • Composite approach (Nachamkin) • Contiguous rain area approach (CRA; Ebert and McBride, 2000; Gallus and others) • Error components • displacement • volume • pattern
missed Feature- or object-based verification • Baldwin object-based approach • Cluster analysis (Marzban and Sandgathe) • SAL approach for watersheds • Method for Object-based Diagnostic Evaluation (MODE) • Others…
MODE object definition Two parameters used to identify objects: • Convolution radius • Precipitation threshold Raw field Raw values are “restored” to the objects, to allow evaluation of precipitation amount distributions and other characteristics Objects
Object merging and matching • Definitions • Merging: Associating objects in the same field • Matching: Associating objects between fields • Fuzzy logic approach • Attributes – used for matching, merging, evaluation Example single attributes: Location Size (area) Orientation angle Intensity (0.10, 0.25, 0.50, 0.75, 0.90 quantiles) Example paired attributes: Centroid/boundary distance Size ratio Angle difference Intensity differences
Object-based example: 1 June 2005 WRF ARW (24-h) Stage II Radius = 15 grid squares, Threshold = 0.05”
1 3 2 Object-based example 1 June 2006 • Area ratios (1) 1.3 (2) 1.2 (3) 1.1 Û All forecast areas were somewhat too large • Location errors (1) Too far West (2) Too far South (3) Too far North WRF ARW-2 Objects with Stage II Objects overlaid
1 3 2 Object-based example 1 June 2006 • Ratio of median intensities in objects (1) 1.3 (2) 0.7 (3) 1.9 • Ratio of 0.90th quantiles of intensities in objects (1) 1.8 (2) 2.9 (3) 1.1 ÛAll WRF 0.90th intensities were too large; 2 of 3 median intensity values were too large WRF ARW-2 Objects with Stage II Objects overlaid
1 3 2 Object-based example 1 June 2006 • MODE provides info about areas, displacement, intensity, etc. • In contrast: POD = 0.40 FAR = 0.56 CSI = 0.27 WRF ARW-2 Objects with Stage II Objects overlaid
Applications of MODE • Climatological summaries of object characteristics • Evaluation of individual forecasting systems • Systematic errors • Matching capabilities (overall skill measure) • Model diagnostics • User-relevant information • Performance as a function of scale • Comparison of forecasting systems • As above
Example summary statistics 22-km WRF forecasts from 2001-2002
Example summary statistics • MODE “Rose Plots” • Displacement of matched forecast objects
Threshold (in/100) Radius (grid sq) Verification “Quilts” • Forecast performance attributes as a function of spatial scale • Can be created for almost any attribute or statistic • Provides a summary of performance • Guides selection of parameters Verification quilt showing a measure of matching capability. Warm colors indicate stronger matches. Based on 9 cases
MODE availability http://www.dtcenter.org/met/users/ Available as part of the Model Evaluation Tools (MET)
How can we (rationally) decide which method(s) to use? • MODE is just one of many new approaches… • What methods should be recommended to operational centers, others doing verification? • What are the differences between the various approaches? • What different forecast attributes can each approach measure? • What can they tell us about forecast performance? • How can they be used to • Improve forecasts? • Help decision makers? • Which methods are most useful for specific types of applications?
Spatial verification method intercomparison project • Methods applied to same datasets • WRF forecast and gridded observed precipitation in Central U.S. • NIMROD, MAP D-PHASE/COPS, MeteoSwiss cases • Perturbed cases • Idealized cases • Subjective forecast evaluations
Intercomparison web page • References • Background • Data and cases • Software http://www.ral.ucar.edu/projects/icp/
Subjective evaluation • Model performance rated on scale from 1-5 (5 was best) • N~22
Subjective evaluation MODEL B OBS MODEL A MODEL C
Conclusion • Many new spatial verification methods are becoming available – a new world of verification • Intercomparison project will help lead to better understanding of new methods • Many other issues remain: • Ensemble and probability forecasts • Extreme and high impact weather • Observational uncertainty • Understanding fundamentals of new methods and measures (e.g., equitability, propriety)