Verification Introduction Holly C. Hartmann

Verification Introduction Holly C. Hartmann Department of Hydrology and Water ResourcesUniversity of Arizona hollyoregon@juno.com RFC Verification Workshop, 08/14/2007

Goals • General concepts of verification • Think about how to apply to your operations • Be able to respond to and influence NWS verification program • Be prepared as new tools become available • Be able to do some of their own verification • Be able to work with researchers on verification projects • Contribute to development of verification tools (e.g., look at various options) • Avoid some typical mistakes

Agenda • Introduction to Verification • Applications, Rationale, Basic Concepts • Data Visualization and Exploration • Deterministic Scalar measures • 2. Categorical measures – KEVIN WERNER • Deterministic Forecasts • Ensemble Forecasts • 3. Diagnostic Verification • Reliability • Discrimination • Conditioning/Structuring Analyses • 4. Lab Session/Group Exercise • - Developing Verification Strategies • - Connecting to Forecast Operations and Users

Why Do Verification?It depends… Administrative: logistics, selected quantitative criteria Operations: inputs, model states, outputs, quick! Research: sources of error, targeting research Users: making decisions, exploit skill, avoid mistakes Concerns about verification?

Need for Verification Measures Verification statistics identify • accuracy of forecasts • sources of skill in forecasts - sources of uncertainty in forecasts - conditions where and when forecasts are skillful or not skillful, and why Verification statistics then can inform • improvements in terms of forecast skill and decision making with alternate forecast sources (e.g., climatology, persistence, new forecast systems) Adapted from: Regonda, Demargne, and Seo, 2006

Skill versus Value A forecast has skill if it predicts the observed conditions well according to some objective or subjective criteria. A forecast has value if it helps the user to make better decisions than without knowledge of the forecast. • Forecasts with poor skill can be valuable (e.g. extreme event forecasted in wrong place) • Forecasts with high skill can be of little value (e.g. blue sky desert) Assess quality of forecast system i.e. determine skill and value of forecast Credit: Hagedorn (2006) and Julie Demargne

Stakeholder Use of HydroClimate Info & Forecasts Common across all groups Uninformed, mistaken about forecast interpretation Use of forecasts limited by lack of demonstrated forecast skill Have difficulty specifying required accuracy Common across many, but not all, stakeholders Have difficulty distinguishing between “good” & “bad” products Have difficulty placing forecasts in historical context Unique among stakeholders Relevant forecast variables, regions (location & scale), seasons, lead times, performance characteristics Technical sophistication: base probabilities, distributions, math Role of of forecasts in decision making

What is a Perfect Forecast? Forecast evaluation concepts All happy families are alike; each unhappy family is unhappy in its own way. -- Leo Tolstoy (1876) All perfect forecasts are alike; each imperfect forecast is imperfect in its own way. -- Holly Hartmann (2002)

Different Forecasts, Information, Evaluation Deterministic Categorical Probabilistic “Today’s high will be 76 degrees, and it will be partly cloudy, with a 30% chance of rain.”

Different Forecasts, Information, Evaluation Deterministic Categorical Probabilistic “Today’s high will be 76 degrees, and it will be partly cloudy, with a 30% chance of rain.” Deterministic Categorical Probabilistic 76° 30% No rain Rain How would you evaluate each of these?

Standard hydrograph Different Forecasts, Information, Evaluation Deterministic Categorical Probabilistic “Today’s high will be 76 degrees, and it will be partly cloudy, with a 30% chance of rain.” Deterministic

ESP Forecasts: User preferences influence verification From: California-Nevada River Forecast Center

So Many Evaluation Criteria! Categorical Hit Rate Surprise rate Threat Score Gerrity Score Success Ratio Post-agreement Percent Correct Pierce Skill Score Gilbert Skill Score Heidke Skill Score Critical Success index Percent N-class errors Modified Heidke Skill Score Hannsen and Kuipers Score Gandin and Murphy Skill Scores… • Deterministic • Bias • Correlation • RMSE • Standardized RMSE • Nash-Sutcliffe • Linear Error in Probability Space • Probabilistic • Brier Score • Ranked Probability Score • Distributions-oriented Measures • Reliability • Discrimination • Sharpness

RFC Verification System: Metrics Source: Verification Group, courtesy J. Demargne

Possible Performance Criteria Accuracy - overall correspondence between forecasts and observations Bias - difference between average forecast and average observation Consistency - forecasts don’t waffle around Good consistency

Possible Performance Criteria Accuracy - overall correspondence between forecasts and observations Bias - difference between average forecast and average observation Consistency - forecasts don’t waffle around Sharpness/Refinement – ability to make bullish forecast statements Not Sharp Sharp

What makes a forecast “good”? Adapted from : Ebert (2003)

Forecasting Tradeoffs Forecast performance is multi-faceted False Alarms Surprises warning without event event without warning No fire “False Alarm Ratio” “Probability of Detection” A forecaster’s fundamental challenge is balancing these two. Which is more important? Depends on the specific decision context…

How Good? Compared to What? SForecast – SBaseline SPerfect – SBaseline SForecast SBaseline Skill Score = = 1 - Skill Score: (0.50 – 0.54)/(1.00-0.54) = -8.6% ~worse than guessing~ What is the appropriate Baseline?

Graphical Forecast Evaluation

Basic Data Display Historical seasonal water supply outlooks Colorado River Basin Morrill, Hartmann, and Bales, 2007

Scatter plots Historical seasonal water supply outlooks Colorado River Basin Morrill, Hartmann, and Bales, 2007

Histograms Historical seasonal water supply outlooks Colorado River Basin Morrill, Hartmann, and Bales, 2007

IVP Scatterplot Example Source: H. Herr

Cumulative Distribution Function (CDF): IVP Cat 1 = No Observed Precipitation Cat 2 = Observed Precipitation (>0.001”) Empirical distribution of forecast probabilities for different observations categories Goal: Widely separated CDFs Source: H. Herr, IVP Charting Examples, 2007

Probability Density Function (PDF): IVP Cat 1 = No Observed Precipitation Cat 2 = Observed Precipitation (>0.001”) Empirical distribution for 10 bins for IVP GUI Goal: Widely separated PDFs Source: H. Herr, IVP Charting Examples, 2007

“Box-plots”: Quantiles and Extremes Based on summarizing CDF computation and plot Goal: Widely separated box-plots Cat 1 = No Observed Precipitation Cat 2 = Observed Precipitation (>0.001”) Source: H. Herr, IVP Charting Examples, 2007

Scalar Forecast Evaluation

Standard Scalar Measures Bias Mean forecast = Mean observed Correlation Coefficient Variance shared between forecast and observed (r2) Says nothing about bias or whether forecast variance = observed variance Pearson correlation coefficient: assumes normal distribution, can be + or – (Rank r: only +, non-normal ok) Root Mean Squared Error Distance between forecast/observation values Better than correlation, poor when error is heteroscedastic Emphasizes performance for high flows Alternative: Mean Absolute Error (MAE) Forecast Observed fcst obs

Standard Scalar Measures (with Scatterplot) 1943-99 April 1 Forecasts for Apr-Sept Streamflow at Stehekin R at Stehekin, WA 1954-97 January 1 Forecasts for Jan-May Streamflow at Verde R blw Tangle Crk, AZ Bias = 22 Corr = 0.92 RMSE = 74.4 Bias = -87.5 Corr = 0.58 RMSE = 228.3 Forecast (1000’s ac-ft) Observed (1000’s ac-ft) Observed (1000’s ac-ft)

IVP: Deterministic Scalar Measures ME: smallest; + and – errors cancel MAE vs. RMSE: RMSE influenced by large errors for large events MAXERR: largest Sample Size: small samples have large uncertainty Source: H. Herr, IVP Charting Examples, 2007

IVP: RMSE – Skill Scores Skill compared to Persistence Forecast Source: H. Herr, IVP Charting Examples, 2007

Verification Introduction Holly C. Hartmann

Verification Introduction Holly C. Hartmann

Presentation Transcript

Holly Smackeye!!!

C# Introduction

Holly Edwards

American Holly

C# - Introduction

Holly

Christmas Holly

By: Jessica Hartmann

Buddy Holly

Verification Continued… Holly C. Hartmann

Introduction C++

Holly

Hardware Verification Group: Introduction

Holly Rupe

C# - Introduction

Holly Durham

Introduction to Forecast Verification

Miriam Hartmann, MPH

C# Introduction

C# Introduction

C# Introduction