370 likes | 559 Views
Verification Introduction Holly C. Hartmann Department of Hydrology and Water Resources University of Arizona hollyoregon@juno.com. RFC Verification Workshop, 08/14/2007. Goals. General concepts of verification Think about how to apply to your operations
E N D
Verification Introduction Holly C. Hartmann Department of Hydrology and Water ResourcesUniversity of Arizona hollyoregon@juno.com RFC Verification Workshop, 08/14/2007
Goals • General concepts of verification • Think about how to apply to your operations • Be able to respond to and influence NWS verification program • Be prepared as new tools become available • Be able to do some of their own verification • Be able to work with researchers on verification projects • Contribute to development of verification tools (e.g., look at various options) • Avoid some typical mistakes
Agenda • Introduction to Verification • Applications, Rationale, Basic Concepts • Data Visualization and Exploration • Deterministic Scalar measures • 2. Categorical measures – KEVIN WERNER • Deterministic Forecasts • Ensemble Forecasts • 3. Diagnostic Verification • Reliability • Discrimination • Conditioning/Structuring Analyses • 4. Lab Session/Group Exercise • - Developing Verification Strategies • - Connecting to Forecast Operations and Users
Why Do Verification?It depends… Administrative: logistics, selected quantitative criteria Operations: inputs, model states, outputs, quick! Research: sources of error, targeting research Users: making decisions, exploit skill, avoid mistakes Concerns about verification?
Need for Verification Measures Verification statistics identify • accuracy of forecasts • sources of skill in forecasts - sources of uncertainty in forecasts - conditions where and when forecasts are skillful or not skillful, and why Verification statistics then can inform • improvements in terms of forecast skill and decision making with alternate forecast sources (e.g., climatology, persistence, new forecast systems) Adapted from: Regonda, Demargne, and Seo, 2006
Skill versus Value A forecast has skill if it predicts the observed conditions well according to some objective or subjective criteria. A forecast has value if it helps the user to make better decisions than without knowledge of the forecast. • Forecasts with poor skill can be valuable (e.g. extreme event forecasted in wrong place) • Forecasts with high skill can be of little value (e.g. blue sky desert) Assess quality of forecast system i.e. determine skill and value of forecast Credit: Hagedorn (2006) and Julie Demargne
Stakeholder Use of HydroClimate Info & Forecasts Common across all groups Uninformed, mistaken about forecast interpretation Use of forecasts limited by lack of demonstrated forecast skill Have difficulty specifying required accuracy Common across many, but not all, stakeholders Have difficulty distinguishing between “good” & “bad” products Have difficulty placing forecasts in historical context Unique among stakeholders Relevant forecast variables, regions (location & scale), seasons, lead times, performance characteristics Technical sophistication: base probabilities, distributions, math Role of of forecasts in decision making
What is a Perfect Forecast? Forecast evaluation concepts All happy families are alike; each unhappy family is unhappy in its own way. -- Leo Tolstoy (1876) All perfect forecasts are alike; each imperfect forecast is imperfect in its own way. -- Holly Hartmann (2002)
Different Forecasts, Information, Evaluation Deterministic Categorical Probabilistic “Today’s high will be 76 degrees, and it will be partly cloudy, with a 30% chance of rain.”
Different Forecasts, Information, Evaluation Deterministic Categorical Probabilistic “Today’s high will be 76 degrees, and it will be partly cloudy, with a 30% chance of rain.” Deterministic Categorical Probabilistic 76° 30% No rain Rain How would you evaluate each of these?
Standard hydrograph Different Forecasts, Information, Evaluation Deterministic Categorical Probabilistic “Today’s high will be 76 degrees, and it will be partly cloudy, with a 30% chance of rain.” Deterministic
ESP Forecasts: User preferences influence verification From: California-Nevada River Forecast Center
ESP Forecasts: User preferences influence verification From: California-Nevada River Forecast Center
ESP Forecasts: User preferences influence verification From: California-Nevada River Forecast Center
ESP Forecasts: User preferences influence verification From: California-Nevada River Forecast Center
So Many Evaluation Criteria! Categorical Hit Rate Surprise rate Threat Score Gerrity Score Success Ratio Post-agreement Percent Correct Pierce Skill Score Gilbert Skill Score Heidke Skill Score Critical Success index Percent N-class errors Modified Heidke Skill Score Hannsen and Kuipers Score Gandin and Murphy Skill Scores… • Deterministic • Bias • Correlation • RMSE • Standardized RMSE • Nash-Sutcliffe • Linear Error in Probability Space • Probabilistic • Brier Score • Ranked Probability Score • Distributions-oriented Measures • Reliability • Discrimination • Sharpness
RFC Verification System: Metrics Source: Verification Group, courtesy J. Demargne
RFC Verification System: Metrics Source: Verification Group, courtesy J. Demargne
Possible Performance Criteria Accuracy - overall correspondence between forecasts and observations Bias - difference between average forecast and average observation Consistency - forecasts don’t waffle around Good consistency
Possible Performance Criteria Accuracy - overall correspondence between forecasts and observations Bias - difference between average forecast and average observation Consistency - forecasts don’t waffle around Sharpness/Refinement – ability to make bullish forecast statements Not Sharp Sharp
What makes a forecast “good”? Adapted from : Ebert (2003)
What makes a forecast “good”? Adapted from : Ebert (2003)
Forecasting Tradeoffs Forecast performance is multi-faceted False Alarms Surprises warning without event event without warning No fire “False Alarm Ratio” “Probability of Detection” A forecaster’s fundamental challenge is balancing these two. Which is more important? Depends on the specific decision context…
How Good? Compared to What? SForecast – SBaseline SPerfect – SBaseline SForecast SBaseline Skill Score = = 1 - Skill Score: (0.50 – 0.54)/(1.00-0.54) = -8.6% ~worse than guessing~ What is the appropriate Baseline?
Graphical Forecast Evaluation
Basic Data Display Historical seasonal water supply outlooks Colorado River Basin Morrill, Hartmann, and Bales, 2007
Scatter plots Historical seasonal water supply outlooks Colorado River Basin Morrill, Hartmann, and Bales, 2007
Histograms Historical seasonal water supply outlooks Colorado River Basin Morrill, Hartmann, and Bales, 2007
IVP Scatterplot Example Source: H. Herr
Cumulative Distribution Function (CDF): IVP Cat 1 = No Observed Precipitation Cat 2 = Observed Precipitation (>0.001”) Empirical distribution of forecast probabilities for different observations categories Goal: Widely separated CDFs Source: H. Herr, IVP Charting Examples, 2007
Probability Density Function (PDF): IVP Cat 1 = No Observed Precipitation Cat 2 = Observed Precipitation (>0.001”) Empirical distribution for 10 bins for IVP GUI Goal: Widely separated PDFs Source: H. Herr, IVP Charting Examples, 2007
“Box-plots”: Quantiles and Extremes Based on summarizing CDF computation and plot Goal: Widely separated box-plots Cat 1 = No Observed Precipitation Cat 2 = Observed Precipitation (>0.001”) Source: H. Herr, IVP Charting Examples, 2007
Scalar Forecast Evaluation
Standard Scalar Measures Bias Mean forecast = Mean observed Correlation Coefficient Variance shared between forecast and observed (r2) Says nothing about bias or whether forecast variance = observed variance Pearson correlation coefficient: assumes normal distribution, can be + or – (Rank r: only +, non-normal ok) Root Mean Squared Error Distance between forecast/observation values Better than correlation, poor when error is heteroscedastic Emphasizes performance for high flows Alternative: Mean Absolute Error (MAE) Forecast Observed fcst obs
Standard Scalar Measures (with Scatterplot) 1943-99 April 1 Forecasts for Apr-Sept Streamflow at Stehekin R at Stehekin, WA 1954-97 January 1 Forecasts for Jan-May Streamflow at Verde R blw Tangle Crk, AZ Bias = 22 Corr = 0.92 RMSE = 74.4 Bias = -87.5 Corr = 0.58 RMSE = 228.3 Forecast (1000’s ac-ft) Observed (1000’s ac-ft) Observed (1000’s ac-ft)
IVP: Deterministic Scalar Measures ME: smallest; + and – errors cancel MAE vs. RMSE: RMSE influenced by large errors for large events MAXERR: largest Sample Size: small samples have large uncertainty Source: H. Herr, IVP Charting Examples, 2007
IVP: RMSE – Skill Scores Skill compared to Persistence Forecast Source: H. Herr, IVP Charting Examples, 2007