300 likes | 598 Views
RFC Verification Workshop. An introduction to verifying probability forecasts. James Brown. James.D.Brown@noaa.gov. Goals for today. Introduction to methods What methods are available? How do they reveal (or not) particular errors? Lecture now, and hands-on training later
E N D
RFC Verification Workshop An introduction to verifying probability forecasts James Brown James.D.Brown@noaa.gov
Goals for today • Introduction to methods • What methods are available? • How do they reveal (or not) particular errors? • Lecture now, and hands-on training later • Introduction to prototype software • Ensemble Verification System (EVS) • Part of a larger experimental project (XEFS) • Lecture now, and hands on training later 2
Goals for today • 3. To establish user-requirements • EVS in very early (prototype) stage • Pool of methods may expand or contract • Need some input on verification products • AND to address pre-workshop questions…...
Pre-workshop questions How is ensemble verification done? Same for short/long-term ensembles? What tools, and are they operational? Which metrics for which situations? Simple metrics for end-users? How best to manage the workload? What data need to be archived/how? 4
Contents for next hour • 1. Background and status • Overview of EVS • Metrics available in EVS • 4. First look at the user-interface (GUI)
A verification strategy? • A first look at operational needs • Two classes of verification identified • High time sensitivity (‘prognostic’) • e.g. how reliable is my live flood forecast?... • …where should I hedge my bets? • Less time sensitive (‘diagnostic’) • e.g. which forecasts do less well and why?
Prognostic example Live forecast (L) Historical observations |μH = μL± 1.0˚C Matching historical forecasts (H) Temperature (oC) Forecast lead day
Diagnostic example 1.0 e.g. flood warning when P>=0.9 Single-valued forecast Probability of warning correctly (hit) Climatology 0 1.0 0 Probability of warning incorrectly (‘false alarms’)
Motivation for EVS • Motivation for EVS (and XEFS) • Demand: forecasters and their customers • Demand for useable verification products • ….limitations of existing software • History • Ensemble Verification Program (EVP) • Comprised (too) many parts, lacked flexibility • Prototype EVS begun in May 07 for XEFS….. 10
Position in XEFS Ensemble Viewer EPP User Interface Ens. User Interface OFS Flow Data MODs Ensemble/prob. products IFP EPP3 ESP2 EnsPost EPG Ens. Streamflow Prediction Subsystem Hydro-meteorol. ensembles Raw flow ens. Ens. Post-Proc. Pp’ed flow ens. Ens. Product Generation Subsystem Ens. Pre-Processor HMOS Ensemble Processor Atmospheric forcing data Streamflow Hydrologic Ensemble Hindcaster EVS Ensemble Verification Subsystem Ensemble verification products Precip., temp. etc.
Scope of EVS • Diagnostic verification • For diagnostic purposes (less time-sensitive) • Prognostic built into forecasting systems • Diagnostic questions include…. • Are ensembles reliable? • Prob[flood]=0.9: does it occur 9/10 times? • Are forecaster MODS working well? • What are the major sources of uncertainty? 13
Design goals of EVS • Verification of continuous time-series • Temperature, precipitation, streamflow etc. • > 1 forecast points, but not spatial products • All types of forecast times • Any lead time (e.g. 1 day – 2 years or longer) • Any forecast resolution (e.g. hourly, daily) • Pair forecasts/observed (in different t-zones) • Ability to aggregate across forecast points
Design goals of EVS • Flexibility to target data of interest • Subset based on forecasts and observations • Two conditions: 1) time; 2) variable value • e.g. forecasts where ensemble mean < 0˚C • e.g. max. observed flow in 90 day window • Ability to pool/aggregate forecast points • Number of observations can be limiting • Sometimes appropriate to pool points
Design goals of EVS • Carefully selected metrics • Different levels of detail on errors • Some are more complex than others, but…. • Use cases and online docs. to assist • To be ‘user-friendly’ • Many factors determine this…. • GUI, I/O, exec. speed, batch modes
Example of workflow How biased are my winter flows > flood level at dam A?
Archiving requirements • Coordinated across XEFS: • The forecasts • Streamflow: ESP binary files (.CS) • Temperature and precip: OHD datacard files • The observations • OHD datacard files • Unlikely to be database in near future
Types of metrics • Many ways to test a probability forecast • Tests for single-valued property (e.g. mean) • Tests of broader forecast distribution • Both may involve reference forecasts (“skill”) • Caveats in testing probabilities • Observed probabilities require many events • Big assumption 1: we can ‘pool’ events • Big assumption 2: observations are ‘good’ 20
Problem of cont. forecasts • Discrete/categorical forecasts • Many metrics rely on discrete forecasts • e.g. will it rain? {yes/no}(rain > 0.01) • e.g. will it flood? {yes/no}(stage > flood level) What about continuous forecasts? • An infinite number of events • Arbitrary event thresholds (i.e. ‘bins’)? • Typically, yes (and choice will affect results) 21
Metrics in EVS • Detail varies with verification question • e.g. inspection of ‘blown’ forecasts (detailed) • e.g. avg. reliability of flood forecast (< detail) • e.g. rapid screening of forecasts (<< detail) • All included to some degree in EVS…… 22
Most detailed (box plot) Greatest + ve ‘Errors’ for 1 forecast 90 percent. 80 percent. 50 percent. Ensemble forecast errors 20 percent. 10 percent. Greatest - ve Observation 0 2 4 6 8 10 12 14 16 18 20 Time (days since start time) 23
Most detailed (box plot) Greatest + ve ‘Errors’ for 1 forecast Observation 90 percent. 80 percent. 50 percent. Ensemble forecast errors 20 percent. 10 percent. Greatest - ve Observed value (increasing size) 24
Less detail (Reliability) “On occasions when flooding is forecast with probability 0.5, it should occur 50% of the time.” Observed probability given forecast “Forecast bias” 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Forecast probability (probability of flooding) 25
Less detail (C. Talagrand) “If river stage <=X is forecast with probability 0.5, it should be observed 50% of the time.” Cumulative probability “Forecast bias” 0 10 20 30 40 50 60 70 80 90 100 Position of observation in forecast distribution 26
Least detailed (a score) Brier score =1/5 x{(0.8-1.0)2+(0.1-1.0)2 + (0.0-0.0)2 + (0.95-1.0)2 +(1.0-1.0)2} 5 2.0 1.6 1.2 0.8 0.4 0.0 4 1 2 River stage 3 Observation Flood stage Forecast 0 5 10 15 20 25 30 Time (days) 27
Least detailed (a score) 1.0 0.8 0.6 0.4 0.2 0.0 Observation B Single forecast CRPS = A2 + B2 Cumulative probability Then average across multiple forecasts: small scores are better A 0 5 10 15 20 25 30 Precipitation amount 28
Rest of today • Two-hour lab sessions with EVS • Start with synthetic data (with simple errors) • Then more on to a couple of real cases • Verification plans and feedback • Real-time (‘prognostic’) verification • Screening verification outputs • Developments in EVS • Feedback: discussion and survey 30