1 / 30

James Brown

RFC Verification Workshop. An introduction to verifying probability forecasts. James Brown. James.D.Brown@noaa.gov. Goals for today. Introduction to methods What methods are available? How do they reveal (or not) particular errors? Lecture now, and hands-on training later

laurie
Download Presentation

James Brown

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RFC Verification Workshop An introduction to verifying probability forecasts James Brown James.D.Brown@noaa.gov

  2. Goals for today • Introduction to methods • What methods are available? • How do they reveal (or not) particular errors? • Lecture now, and hands-on training later • Introduction to prototype software • Ensemble Verification System (EVS) • Part of a larger experimental project (XEFS) • Lecture now, and hands on training later 2

  3. Goals for today • 3. To establish user-requirements • EVS in very early (prototype) stage • Pool of methods may expand or contract • Need some input on verification products • AND to address pre-workshop questions…...

  4. Pre-workshop questions How is ensemble verification done? Same for short/long-term ensembles? What tools, and are they operational? Which metrics for which situations? Simple metrics for end-users? How best to manage the workload? What data need to be archived/how? 4

  5. Contents for next hour • 1. Background and status • Overview of EVS • Metrics available in EVS • 4. First look at the user-interface (GUI)

  6. 1. Background and status

  7. A verification strategy? • A first look at operational needs • Two classes of verification identified • High time sensitivity (‘prognostic’) • e.g. how reliable is my live flood forecast?... • …where should I hedge my bets? • Less time sensitive (‘diagnostic’) • e.g. which forecasts do less well and why?

  8. Prognostic example Live forecast (L) Historical observations |μH = μL± 1.0˚C Matching historical forecasts (H) Temperature (oC) Forecast lead day

  9. Diagnostic example 1.0 e.g. flood warning when P>=0.9 Single-valued forecast Probability of warning correctly (hit) Climatology 0 1.0 0 Probability of warning incorrectly (‘false alarms’)

  10. Motivation for EVS • Motivation for EVS (and XEFS) • Demand: forecasters and their customers • Demand for useable verification products • ….limitations of existing software • History • Ensemble Verification Program (EVP) • Comprised (too) many parts, lacked flexibility • Prototype EVS begun in May 07 for XEFS….. 10

  11. Position in XEFS Ensemble Viewer EPP User Interface Ens. User Interface OFS Flow Data MODs Ensemble/prob. products IFP EPP3 ESP2 EnsPost EPG Ens. Streamflow Prediction Subsystem Hydro-meteorol. ensembles Raw flow ens. Ens. Post-Proc. Pp’ed flow ens. Ens. Product Generation Subsystem Ens. Pre-Processor HMOS Ensemble Processor Atmospheric forcing data Streamflow Hydrologic Ensemble Hindcaster EVS Ensemble Verification Subsystem Ensemble verification products Precip., temp. etc.

  12. 2. Overview of EVS

  13. Scope of EVS • Diagnostic verification • For diagnostic purposes (less time-sensitive) • Prognostic built into forecasting systems • Diagnostic questions include…. • Are ensembles reliable? • Prob[flood]=0.9: does it occur 9/10 times? • Are forecaster MODS working well? • What are the major sources of uncertainty? 13

  14. Design goals of EVS • Verification of continuous time-series • Temperature, precipitation, streamflow etc. • > 1 forecast points, but not spatial products • All types of forecast times • Any lead time (e.g. 1 day – 2 years or longer) • Any forecast resolution (e.g. hourly, daily) • Pair forecasts/observed (in different t-zones) • Ability to aggregate across forecast points

  15. Design goals of EVS • Flexibility to target data of interest • Subset based on forecasts and observations • Two conditions: 1) time; 2) variable value • e.g. forecasts where ensemble mean < 0˚C • e.g. max. observed flow in 90 day window • Ability to pool/aggregate forecast points • Number of observations can be limiting • Sometimes appropriate to pool points

  16. Design goals of EVS • Carefully selected metrics • Different levels of detail on errors • Some are more complex than others, but…. • Use cases and online docs. to assist • To be ‘user-friendly’ • Many factors determine this…. • GUI, I/O, exec. speed, batch modes

  17. Example of workflow How biased are my winter flows > flood level at dam A?

  18. Archiving requirements • Coordinated across XEFS: • The forecasts • Streamflow: ESP binary files (.CS) • Temperature and precip: OHD datacard files • The observations • OHD datacard files • Unlikely to be database in near future

  19. 3. Metrics available 19

  20. Types of metrics • Many ways to test a probability forecast • Tests for single-valued property (e.g. mean) • Tests of broader forecast distribution • Both may involve reference forecasts (“skill”) • Caveats in testing probabilities • Observed probabilities require many events • Big assumption 1: we can ‘pool’ events • Big assumption 2: observations are ‘good’ 20

  21. Problem of cont. forecasts • Discrete/categorical forecasts • Many metrics rely on discrete forecasts • e.g. will it rain? {yes/no}(rain > 0.01) • e.g. will it flood? {yes/no}(stage > flood level) What about continuous forecasts? • An infinite number of events • Arbitrary event thresholds (i.e. ‘bins’)? • Typically, yes (and choice will affect results) 21

  22. Metrics in EVS • Detail varies with verification question • e.g. inspection of ‘blown’ forecasts (detailed) • e.g. avg. reliability of flood forecast (< detail) • e.g. rapid screening of forecasts (<< detail) • All included to some degree in EVS…… 22

  23. Most detailed (box plot) Greatest + ve ‘Errors’ for 1 forecast 90 percent. 80 percent. 50 percent. Ensemble forecast errors 20 percent. 10 percent. Greatest - ve Observation 0 2 4 6 8 10 12 14 16 18 20 Time (days since start time) 23

  24. Most detailed (box plot) Greatest + ve ‘Errors’ for 1 forecast Observation 90 percent. 80 percent. 50 percent. Ensemble forecast errors 20 percent. 10 percent. Greatest - ve Observed value (increasing size) 24

  25. Less detail (Reliability) “On occasions when flooding is forecast with probability 0.5, it should occur 50% of the time.” Observed probability given forecast “Forecast bias” 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Forecast probability (probability of flooding) 25

  26. Less detail (C. Talagrand) “If river stage <=X is forecast with probability 0.5, it should be observed 50% of the time.” Cumulative probability “Forecast bias” 0 10 20 30 40 50 60 70 80 90 100 Position of observation in forecast distribution 26

  27. Least detailed (a score) Brier score =1/5 x{(0.8-1.0)2+(0.1-1.0)2 + (0.0-0.0)2 + (0.95-1.0)2 +(1.0-1.0)2} 5 2.0 1.6 1.2 0.8 0.4 0.0 4 1 2 River stage 3 Observation Flood stage Forecast 0 5 10 15 20 25 30 Time (days) 27

  28. Least detailed (a score) 1.0 0.8 0.6 0.4 0.2 0.0 Observation B Single forecast CRPS = A2 + B2 Cumulative probability Then average across multiple forecasts: small scores are better A 0 5 10 15 20 25 30 Precipitation amount 28

  29. 4. First look at the GUI 29

  30. Rest of today • Two-hour lab sessions with EVS • Start with synthetic data (with simple errors) • Then more on to a couple of real cases • Verification plans and feedback • Real-time (‘prognostic’) verification • Screening verification outputs • Developments in EVS • Feedback: discussion and survey 30

More Related