1 / 67

Why probability forecasts?

Verification of Probability Forecasts at Points WMO QPF Verification Workshop Prague, Czech Republic 14-16 May 2001 Barbara G. Brown NCAR Boulder, Colorado, U.S.A. bgb@ucar.edu. Why probability forecasts?.

maddox
Download Presentation

Why probability forecasts?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Verification of Probability Forecasts at PointsWMO QPF Verification WorkshopPrague, Czech Republic14-16 May 2001Barbara G. BrownNCARBoulder, Colorado, U.S.A.bgb@ucar.edu QPF Verification Workshop

  2. Why probability forecasts? “…the widespread practice of ignoring uncertainty when formulating and communicating forecasts represents an extreme form of inconsistency and generally results in the largest possible reductions in quality and value.” --Murphy (1993) QPF Verification Workshop

  3. Outline • Background and basics • Types of events • Types of forecasts • Representation of probabilistic forecasts in the verification framework QPF Verification Workshop

  4. Outline continued • Verification approaches: focus on 2-category case • Measures • Graphical representations • Using statistical models • Signal detection theory • Ensemble forecast verification • Extensions to multi-category verification problem • Comparing probabilistic and categorical forecasts • Connections to value • Summary, conclusions, issues QPF Verification Workshop

  5. Background and basics • Types of events: • Two-category • Multi-category • Two-category events: • Either event A happens or Event B happens • Examples: Rain/No-rain Hail/No-hail Tornado/No-tornado • Multi-category event • Event A, B, C, ….or Z happens • Example: Precipitation categories (< 1 mm, 1-5 mm, 5-10 mm, etc.) QPF Verification Workshop

  6. Background and basics cont. • Types of forecasts • Completely confident • Forecast probability is either 0 or 1 • Example: Rain/No rain • Probabilistic • Objective (deterministic, statistical, ensemble-based) • Subjective • Probability is stated explicitly QPF Verification Workshop

  7. Background and basics cont. • Representation of probabilistic forecasts in the verification framework x = 0 or 1 f = 0, …, 1.0 f may be limited to only certain values between 0 and 1 • Joint distribution: p(f,x), where x = 0, 1 Ex: If there are 12 possible values of f, then p(f,x) is comprised of 24 elements QPF Verification Workshop

  8. Background and basics, cont. • Factorizations: Conditional and marginal probabilities • Calibration-Refinement factorization: • p(f,x) = p(x|f)p(f) • p(x=0|f) = 1 – p(x=1|f) = 1 – E(x|f) • Only one number is needed to specify the distribution p(x|f) for each f • p(f) is the frequency of use of each forecast probability • Likelihood-Base Rate factorization: • p(f,x) = p(f|x) p(x) • p(x) is the relative frequency of a Yes observation (e.g., the sample climatology of precipitation); p(x) = E(x) QPF Verification Workshop

  9. Attributes [from Murphy and Winkler(1992)] (sharpness) QPF Verification Workshop

  10. Verification approaches: 2x2 case Completely confident forecasts: Use the counts in this table to compute various common statistics (e.g., POD, POFD, H-K, FAR, CSI, Bias, etc.) QPF Verification Workshop

  11. Verification measures for 2x2 (Yes/No) completely confident forecasts QPF Verification Workshop

  12. Relationships among measures in the 2x2 case Many of the measures in the 2x2 case are strongly related in surprisingly complex ways. For example: QPF Verification Workshop

  13. 0.10 0.30 0.50 0.70 0.90 The lines indicate different values of POD and POFD (where POD = POFD). From Brown and Young (2000) QPF Verification Workshop

  14. CSI as a function of p(x=1) and POD=POFD 0.9 0.7 0.5 0.3 0.1 QPF Verification Workshop

  15. CSI as a function of FAR and POD QPF Verification Workshop

  16. Summary measures: Expectation Conditional: E(f|x=0), E(f|x=1) E(x|f) Marginal: E(f) E(x) = p(x=1) Correlation Joint distribution Variability Conditional: Var.(f|x=0), Var(f|x=1) Var(x|f) Marginal: Var(f) Var(x) = E(x)[1-E(x)] Measures for Probabilistic Forecasts QPF Verification Workshop

  17. From Murphy and Winkler (1992)Summary measures for joint and marginal distributions: QPF Verification Workshop

  18. From Murphy and Winkler (1992)Summary measures for conditional distributions: QPF Verification Workshop

  19. Performance measures • Brier score: • Analogous to MSE; negative orientation; • For perfect forecasts: BS=0 • Brier skill score: • Analogous to MSE skill score QPF Verification Workshop

  20. From Murphy and Winkler (1992): QPF Verification Workshop

  21. Brier score displays From Shirey and Erickson, http://www.nws.noaa.gov/tdl/synop/amspapers/masmrfpap.htm QPF Verification Workshop

  22. Brier score displays From http://www.nws.noaa.gov/tdl/synop/mrfpop/mainframes.htm QPF Verification Workshop

  23. Decomposition of the Brier Score Break Brier score into more elemental components: Reliability Resolution Uncertainty Where I = the number of distinct probability values and Then, the Brier Skill Score can be re-formulated as QPF Verification Workshop

  24. Graphical representations of measures • Reliability diagram p(x=1|fi) vs. fi • Sharpness diagram p(f) • Attributes diagram • Reliability, Resolution, Skill/No-skill • Discrimination diagram p(f|x=0) and p(f|x=1) Together, these diagrams provide a relatively complete picture of the quality of a set of probability forecasts QPF Verification Workshop

  25. Reliability and Sharpness (from Wilks 1995) Climatology Minimal RES Underforecasting Good RES, at expense of REL Reliable forecasts of rare event Small sample size QPF Verification Workshop

  26. Reliability and Sharpness (from Murphy and Winkler 1992) Sub Model St. Louis 12-24 h PoP Cool Season Model Sub No skill No RES QPF Verification Workshop

  27. Attributes diagram (from Wilks 1995) QPF Verification Workshop

  28. Icing forecast examples QPF Verification Workshop

  29. Use of statistical models to describe verification features • Exploratory study by Murphy and Wilks (1998) • Case study • Use regression model to model reliability • Use Beta distribution to model p(f) as measure of sharpness • Use multivariate diagram to display combinations of characteristics • Promising approach that is worthy of more investigation QPF Verification Workshop

  30. Fit Beta distribution to p(f) 2 parameters: p. q Ideal: p<1; q<1 1 0 QPF Verification Workshop

  31. Fit regression to Reliability diagram [p(x|f) vs. f] 2 parameters:b0, b1 Murphy and Wilks (1997) QPF Verification Workshop

  32. Summary Plot Murphy and Wilks 1997 QPF Verification Workshop

  33. Signal Detection Theory (SDT) • Approach that has commonly been applied in medicine and other fields • Brought to meteorology by Ian Mason (1982) • Evaluates the ability of forecasts to discriminate between occurrence and non-occurrence of an event • Summarizes characteristics of the Likelihood-Base Rate decomposition of the framework • Tests model performance relative to specific threshold • Ignores calibration • Allows comparison of categorical and probabilistic forecasts QPF Verification Workshop

  34. Mechanics of SDT • Based on likelihood-base rate decomposition p(f,x) = p(f|x) p(x) • Basic elements : • Hit rate (HR) • HR = POD = YY / (YY+NY) • Estimate of p(f=1|x=1) • False Alarm Rate (FA) • FA = 1 - POFD = YN / (YN + NN) • Estimate of p(f=1|x=0) • Relative Operating Characteristic curve • Plot HR vs. FA QPF Verification Workshop

  35. ROC Examples: Mason(1982) QPF Verification Workshop

  36. ROC Examples: Icing forecasts QPF Verification Workshop

  37. ROC • Area under the ROC is a measure of forecast skill • Values less than 0.5 indicate negative skill • Measurement of ROC Area often is better if a normal distribution model is used to model HR and FA • Area can be underestimated if curve is approximated by straight line segments • Harvey et al (1992), Mason (1982); Wilson (2000) QPF Verification Workshop

  38. Idealized ROC (Mason 1982) f(x=1) f(x=0) f(x=0) f(x=1) f(x=0) f(x=1) S=2 S=1 S=0.5 S = s0 / s1 QPF Verification Workshop

  39. Brier score Based on squared error Strictly proper scoring rule Calibration is an important factor; lack of calibration impacts scores Decompositions provide insight into several performance attributes Dependent on frequency of occurrence of the event ROC Considers forecasts’ ability to discriminate between Yes and No events Calibration is not a factor Less dependent on frequency of occurrence of event Provides verification information for individual decision thresholds Comparison of Approaches QPF Verification Workshop

  40. Relative operating levels • Analogous to the ROC, but from the Calibration-Refinement perspective (i.e., given the forecast) • Curves based on • Correct Alarm Ratio: • Miss Ratio: • These statistics are estimates of two conditional probabilities: • Correct Alarm Ratio: p(x=1|f=1) • Miss Ratio: p(x=1|f=0) • For a system with no skill, p(x=1|f=1) = p(x=1|f=0) = p(x) QPF Verification Workshop

  41. ROC Diagram (Mason and Graham 1999) QPF Verification Workshop

  42. ROL Diagram (Mason and Graham 1999) QPF Verification Workshop

  43. Verification of ensemble forecasts • Output of ensemble forecasting systems can be treated as • A probability distribution • A probability • A categorical forecast • Probabilistic forecasts from ensemble systems can be verified using standard approaches for probabilistic forecasts • Common methods • Brier score • ROC QPF Verification Workshop

  44. Example: Palmer et al. (2000)Reliability ECMWF ensemble Multi-model ensemble <0 <1 QPF Verification Workshop

  45. Example: Palmer et al. (2000)ROC ECMWF ensemble Multi-model ensemble QPF Verification Workshop

  46. Verification of ensemble forecasts (cont.) A number of methods have been developed specifically for use with ensemble forecasts. For example: • Rank histograms • Rank position of observations relative to ensemble members • Ideal: Uniform distribution • Non-ideal can occur for many reasons (Hamill 2001) • Ensemble distribution approach (Wilson et al. 1999) • Fit distribution to ensemble • Determine probability associated with that observation QPF Verification Workshop

  47. Rank histograms: QPF Verification Workshop

  48. Distribution approach (Wilson et al. 1999) QPF Verification Workshop

  49. Extensions to multiple categories • Examples: • QPF with several thresholds/categories • Approach 1: Evaluate each category on its own • Compute Brier score, reliability, ROC, etc. for each category separately • Problems: • Some categories will be very rare, have few Yes observations • Throws away important information related to the ordering of predictands and magnitude of error QPF Verification Workshop

  50. Example: Brier skill score for several categories From http://www.nws.noaa.gov/tdl/synop/mrfpop/mainframes.htm QPF Verification Workshop

More Related