New developments and issues in forecast verification

New developments and issues in forecast verification Barbara Brown bgb@ucar.edu Co-authors and contributors: Randy Bullock, John Halley Gotway, Chris Davis, David Ahijevych, Eric Gilleland, Lacey Holland NCAR Boulder, Colorado October 2007

Issues • Uncertainty in verification statistics • Diagnostic and user relevant verification • Verification of high-resolution forecasts • Spatial forecast verification • Incorporation of observational uncertainty • Verification of probabilistic and ensemble forecasts • Verification of extremes • Properties of verification measures • Propriety, Equitability

Issues and new developments • Uncertainty in verification statistics • Diagnostic and user relevant verification • Verification of high-resolution forecasts • Spatial forecast verification • Incorporation of observational uncertainty • Verification of probabilistic and ensemble forecasts • Verification of extremes • Properties of verification measures • Propriety, Equitability

Uncertainty in verification measures Model precipitation example: Equitable Threat Score (ETS) Confidence intervals take into account various sources of error, including sampling and observational Computation of confidence intervals for verification stats is not always straight-forward

F O User-relevant verification: Good forecast or Bad forecast?

F O User-relevant verification: Good forecast or Bad forecast? If I’m a water manager for this watershed, it’s a pretty bad forecast…

F O A B Flight Route User-relevant verification: Good forecast or Bad forecast? O If I’m an aviation traffic strategic planner… It might be a pretty good forecast Different users have different ideas about what makes a good forecast

Diagnostic and user relevant forecast evaluation approaches • Provide the link between weather forecasting and forecast value • Identify and evaluate attributes of the forecasts that are meaningful for particular users • Users could be managers, forecast developers, forecasters, decision makers • Answer questions about forecast performance in the context of users’ decisions • Example questions: How do model changes impact user-relevant variables? What is the typical location error of a thunderstorm? Size of a temperature error? Timing error? Lead time?

Diagnostic and user relevant forecast evaluation approaches (cont.) • Provide more detailed information about forecast quality • What went wrong? What went right? • How can the forecast be improved? • How do 2 forecasts differ from each other, and in what ways is one better than the other?

Which rain forecast is better? Mesoscale model (5 km) 21 Mar 2004 Global model (100 km) 21 Mar 2004 Observed 24h rain Sydney Sydney RMS=13.0 RMS=4.6 High vs. low resolution “Smooth” forecasts generally “Win” according to traditional verification approaches. From E. Ebert

Traditional “Measures”-based approaches Consider forecasts and observations of some dichotomous field on a grid: Some problems with this approach: (1) Non-diagnostic – doesn’t tell us what was wrong with the forecast – or what was right (2) Utra-sensitive to small errors in simulation of localized phenomena CSI = 0 for first 4; CSI > 0 for the 5th

Spatial forecasts Spatial verification techniques aim to: • account for uncertainties in timing and location • account for field spatial structure • provide information on error in physical terms • provide information that is • diagnostic • meaningful to forecast users Weather variables defined over spatial domains have coherent structureand features

Recent research on spatial verification methods • Neighborhood verification methods • give credit to "close" forecasts • Scale decomposition methods • measure scale-dependent error • Object- and feature-based methods • evaluate attributes of identifiable features • Field verification approaches • measure distortion and displacement (phase error) for whole field

Neighborhood verification • Also called “fuzzy” verification • Upscaling • put observations and/or forecast on coarser grid • calculate traditional metrics

Neighborhood verification Treatment of forecast data within a window: • Mean value (upscaling) • Occurrence of event in window • Frequency of event in window  probability • Distribution of values within window • Fractions skill score (Roberts 2005; Roberts and Lean 2007) observed forecast Ebert (2007; Met Applications) provides a review and synthesis of these approaches

Scale decomposition • Wavelet component analysis • Briggs and Levine, 1997 • Casati et al., 2004 • Removes noise • Examine how different scales contribute to traditional scores • Does forecast power spectra match the observed power spectra?

Intense storm displaced Skill scale (km) 1 0 -1 -2 -3 -4 640 320 160 80 40 20 10 5 0 1/16 ¼ ½ 1 2 4 8 16 32 threshold = 1mm/h threshold (mm/h) Scale decomposition • Casati et al. (2004) intensity-scale approach • Wavelets applied to binary image • Traditional score as a function of intensity threshold and scale

Feature-based verification • Composite approach (Nachamkin) • Contiguous rain area approach (CRA; Ebert and McBride, 2000; Gallus and others) • Error components • displacement • volume • pattern

missed Feature- or object-based verification • Baldwin object-based approach • Cluster analysis (Marzban and Sandgathe) • SAL approach for watersheds • Method for Object-based Diagnostic Evaluation (MODE) • Others…

MODE object definition Two parameters used to identify objects: • Convolution radius • Precipitation threshold Raw field Raw values are “restored” to the objects, to allow evaluation of precipitation amount distributions and other characteristics Objects

Object merging and matching • Definitions • Merging: Associating objects in the same field • Matching: Associating objects between fields • Fuzzy logic approach • Attributes – used for matching, merging, evaluation Example single attributes: Location Size (area) Orientation angle Intensity (0.10, 0.25, 0.50, 0.75, 0.90 quantiles) Example paired attributes: Centroid/boundary distance Size ratio Angle difference Intensity differences

Object-based example: 1 June 2005 WRF ARW (24-h) Stage II Radius = 15 grid squares, Threshold = 0.05”

1 3 2 Object-based example 1 June 2006 • Area ratios (1) 1.3 (2) 1.2 (3) 1.1 Û All forecast areas were somewhat too large • Location errors (1) Too far West (2) Too far South (3) Too far North WRF ARW-2 Objects with Stage II Objects overlaid

1 3 2 Object-based example 1 June 2006 • Ratio of median intensities in objects (1) 1.3 (2) 0.7 (3) 1.9 • Ratio of 0.90th quantiles of intensities in objects (1) 1.8 (2) 2.9 (3) 1.1 ÛAll WRF 0.90th intensities were too large; 2 of 3 median intensity values were too large WRF ARW-2 Objects with Stage II Objects overlaid

1 3 2 Object-based example 1 June 2006 • MODE provides info about areas, displacement, intensity, etc. • In contrast: POD = 0.40 FAR = 0.56 CSI = 0.27 WRF ARW-2 Objects with Stage II Objects overlaid

Applications of MODE • Climatological summaries of object characteristics • Evaluation of individual forecasting systems • Systematic errors • Matching capabilities (overall skill measure) • Model diagnostics • User-relevant information • Performance as a function of scale • Comparison of forecasting systems • As above

Example summary statistics 22-km WRF forecasts from 2001-2002

Example summary statistics

Example summary statistics • MODE “Rose Plots” • Displacement of matched forecast objects

Threshold (in/100) Radius (grid sq) Verification “Quilts” • Forecast performance attributes as a function of spatial scale • Can be created for almost any attribute or statistic • Provides a summary of performance • Guides selection of parameters Verification quilt showing a measure of matching capability. Warm colors indicate stronger matches. Based on 9 cases

MODE availability http://www.dtcenter.org/met/users/ Available as part of the Model Evaluation Tools (MET)

How can we (rationally) decide which method(s) to use? • MODE is just one of many new approaches… • What methods should be recommended to operational centers, others doing verification? • What are the differences between the various approaches? • What different forecast attributes can each approach measure? • What can they tell us about forecast performance? • How can they be used to • Improve forecasts? • Help decision makers? • Which methods are most useful for specific types of applications?

Spatial verification method intercomparison project • Methods applied to same datasets • WRF forecast and gridded observed precipitation in Central U.S. • NIMROD, MAP D-PHASE/COPS, MeteoSwiss cases • Perturbed cases • Idealized cases • Subjective forecast evaluations

Intercomparison web page • References • Background • Data and cases • Software http://www.ral.ucar.edu/projects/icp/

Subjective evaluation • Model performance rated on scale from 1-5 (5 was best) • N~22

Subjective evaluation MODEL B OBS MODEL A MODEL C

Conclusion • Many new spatial verification methods are becoming available – a new world of verification • Intercomparison project will help lead to better understanding of new methods • Many other issues remain: • Ensemble and probability forecasts • Extreme and high impact weather • Observational uncertainty • Understanding fundamentals of new methods and measures (e.g., equitability, propriety)

New developments and issues in forecast verification

New developments and issues in forecast verification

Presentation Transcript

New developments in AmiBroker

Warning Verification Issues

Forecast Verification

Issues in FPGA Verification

ILC AND NEW DEVELOPMENTS IN COSMOLOGY

ROADS AND NEW DEVELOPMENTS

New Developments in HIV

New Developments in Contraception

New developments in FLUKA

Forecast Verification

Consolidated Returns Issues and Developments

New Developments in VISSIM

New Developments in EB

New Developments in MIRC

New Developments in OAI

Research Issues in Verification and Validation

Forecast Verification

Research Issues in Verification and Validation

Whistleblowing: Developments and Implementation Issues

New Developments in Cataloging

Verification issues