1 / 37

Verification Introduction Holly C. Hartmann

Verification Introduction Holly C. Hartmann Department of Hydrology and Water Resources University of Arizona hollyoregon@juno.com. RFC Verification Workshop, 08/14/2007. Goals. General concepts of verification Think about how to apply to your operations

asa
Download Presentation

Verification Introduction Holly C. Hartmann

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Verification Introduction Holly C. Hartmann Department of Hydrology and Water ResourcesUniversity of Arizona hollyoregon@juno.com RFC Verification Workshop, 08/14/2007

  2. Goals • General concepts of verification • Think about how to apply to your operations • Be able to respond to and influence NWS verification program • Be prepared as new tools become available • Be able to do some of their own verification • Be able to work with researchers on verification projects • Contribute to development of verification tools (e.g., look at various options) • Avoid some typical mistakes

  3. Agenda • Introduction to Verification • Applications, Rationale, Basic Concepts • Data Visualization and Exploration • Deterministic Scalar measures • 2. Categorical measures – KEVIN WERNER • Deterministic Forecasts • Ensemble Forecasts • 3. Diagnostic Verification • Reliability • Discrimination • Conditioning/Structuring Analyses • 4. Lab Session/Group Exercise • - Developing Verification Strategies • - Connecting to Forecast Operations and Users

  4. Why Do Verification?It depends… Administrative: logistics, selected quantitative criteria Operations: inputs, model states, outputs, quick! Research: sources of error, targeting research Users: making decisions, exploit skill, avoid mistakes Concerns about verification?

  5. Need for Verification Measures Verification statistics identify • accuracy of forecasts • sources of skill in forecasts - sources of uncertainty in forecasts - conditions where and when forecasts are skillful or not skillful, and why Verification statistics then can inform • improvements in terms of forecast skill and decision making with alternate forecast sources (e.g., climatology, persistence, new forecast systems) Adapted from: Regonda, Demargne, and Seo, 2006

  6. Skill versus Value A forecast has skill if it predicts the observed conditions well according to some objective or subjective criteria. A forecast has value if it helps the user to make better decisions than without knowledge of the forecast. • Forecasts with poor skill can be valuable (e.g. extreme event forecasted in wrong place) • Forecasts with high skill can be of little value (e.g. blue sky desert) Assess quality of forecast system i.e. determine skill and value of forecast Credit: Hagedorn (2006) and Julie Demargne

  7. Stakeholder Use of HydroClimate Info & Forecasts Common across all groups Uninformed, mistaken about forecast interpretation Use of forecasts limited by lack of demonstrated forecast skill Have difficulty specifying required accuracy Common across many, but not all, stakeholders Have difficulty distinguishing between “good” & “bad” products Have difficulty placing forecasts in historical context Unique among stakeholders Relevant forecast variables, regions (location & scale), seasons, lead times, performance characteristics Technical sophistication: base probabilities, distributions, math Role of of forecasts in decision making

  8. What is a Perfect Forecast? Forecast evaluation concepts All happy families are alike; each unhappy family is unhappy in its own way. -- Leo Tolstoy (1876) All perfect forecasts are alike; each imperfect forecast is imperfect in its own way. -- Holly Hartmann (2002)

  9. Different Forecasts, Information, Evaluation Deterministic Categorical Probabilistic “Today’s high will be 76 degrees, and it will be partly cloudy, with a 30% chance of rain.”

  10. Different Forecasts, Information, Evaluation Deterministic Categorical Probabilistic “Today’s high will be 76 degrees, and it will be partly cloudy, with a 30% chance of rain.” Deterministic Categorical Probabilistic 76° 30% No rain Rain How would you evaluate each of these?

  11. Standard hydrograph Different Forecasts, Information, Evaluation Deterministic Categorical Probabilistic “Today’s high will be 76 degrees, and it will be partly cloudy, with a 30% chance of rain.” Deterministic

  12. ESP Forecasts: User preferences influence verification From: California-Nevada River Forecast Center

  13. ESP Forecasts: User preferences influence verification From: California-Nevada River Forecast Center

  14. ESP Forecasts: User preferences influence verification From: California-Nevada River Forecast Center

  15. ESP Forecasts: User preferences influence verification From: California-Nevada River Forecast Center

  16. So Many Evaluation Criteria! Categorical Hit Rate Surprise rate Threat Score Gerrity Score Success Ratio Post-agreement Percent Correct Pierce Skill Score Gilbert Skill Score Heidke Skill Score Critical Success index Percent N-class errors Modified Heidke Skill Score Hannsen and Kuipers Score Gandin and Murphy Skill Scores… • Deterministic • Bias • Correlation • RMSE • Standardized RMSE • Nash-Sutcliffe • Linear Error in Probability Space • Probabilistic • Brier Score • Ranked Probability Score • Distributions-oriented Measures • Reliability • Discrimination • Sharpness

  17. RFC Verification System: Metrics Source: Verification Group, courtesy J. Demargne

  18. RFC Verification System: Metrics Source: Verification Group, courtesy J. Demargne

  19. Possible Performance Criteria Accuracy - overall correspondence between forecasts and observations Bias - difference between average forecast and average observation Consistency - forecasts don’t waffle around Good consistency

  20. Possible Performance Criteria Accuracy - overall correspondence between forecasts and observations Bias - difference between average forecast and average observation Consistency - forecasts don’t waffle around Sharpness/Refinement – ability to make bullish forecast statements Not Sharp Sharp

  21. What makes a forecast “good”? Adapted from : Ebert (2003)

  22. What makes a forecast “good”? Adapted from : Ebert (2003)

  23. Forecasting Tradeoffs Forecast performance is multi-faceted False Alarms Surprises warning without event event without warning No fire “False Alarm Ratio” “Probability of Detection” A forecaster’s fundamental challenge is balancing these two. Which is more important? Depends on the specific decision context…

  24. How Good? Compared to What? SForecast – SBaseline SPerfect – SBaseline SForecast SBaseline Skill Score = = 1 - Skill Score: (0.50 – 0.54)/(1.00-0.54) = -8.6% ~worse than guessing~ What is the appropriate Baseline?

  25. Graphical Forecast Evaluation

  26. Basic Data Display Historical seasonal water supply outlooks Colorado River Basin Morrill, Hartmann, and Bales, 2007

  27. Scatter plots Historical seasonal water supply outlooks Colorado River Basin Morrill, Hartmann, and Bales, 2007

  28. Histograms Historical seasonal water supply outlooks Colorado River Basin Morrill, Hartmann, and Bales, 2007

  29. IVP Scatterplot Example Source: H. Herr

  30. Cumulative Distribution Function (CDF): IVP Cat 1 = No Observed Precipitation Cat 2 = Observed Precipitation (>0.001”) Empirical distribution of forecast probabilities for different observations categories Goal: Widely separated CDFs Source: H. Herr, IVP Charting Examples, 2007

  31. Probability Density Function (PDF): IVP Cat 1 = No Observed Precipitation Cat 2 = Observed Precipitation (>0.001”) Empirical distribution for 10 bins for IVP GUI Goal: Widely separated PDFs Source: H. Herr, IVP Charting Examples, 2007

  32. “Box-plots”: Quantiles and Extremes Based on summarizing CDF computation and plot Goal: Widely separated box-plots Cat 1 = No Observed Precipitation Cat 2 = Observed Precipitation (>0.001”) Source: H. Herr, IVP Charting Examples, 2007

  33. Scalar Forecast Evaluation

  34. Standard Scalar Measures Bias Mean forecast = Mean observed Correlation Coefficient Variance shared between forecast and observed (r2) Says nothing about bias or whether forecast variance = observed variance Pearson correlation coefficient: assumes normal distribution, can be + or – (Rank r: only +, non-normal ok) Root Mean Squared Error Distance between forecast/observation values Better than correlation, poor when error is heteroscedastic Emphasizes performance for high flows Alternative: Mean Absolute Error (MAE) Forecast Observed fcst obs

  35. Standard Scalar Measures (with Scatterplot) 1943-99 April 1 Forecasts for Apr-Sept Streamflow at Stehekin R at Stehekin, WA 1954-97 January 1 Forecasts for Jan-May Streamflow at Verde R blw Tangle Crk, AZ Bias = 22 Corr = 0.92 RMSE = 74.4 Bias = -87.5 Corr = 0.58 RMSE = 228.3 Forecast (1000’s ac-ft) Observed (1000’s ac-ft) Observed (1000’s ac-ft)

  36. IVP: Deterministic Scalar Measures ME: smallest; + and – errors cancel MAE vs. RMSE: RMSE influenced by large errors for large events MAXERR: largest Sample Size: small samples have large uncertainty Source: H. Herr, IVP Charting Examples, 2007

  37. IVP: RMSE – Skill Scores Skill compared to Persistence Forecast Source: H. Herr, IVP Charting Examples, 2007

More Related