440 likes | 609 Views
EPS Diagnostic Tools. Renate Hagedorn European Centre for Medium-Range Weather Forecasts. A forecast has skill if it predicts the observed conditions well according to some objective or subjective criteria.
E N D
EPS Diagnostic Tools Renate Hagedorn European Centre for Medium-Range Weather Forecasts
A forecast has skill if it predicts the observed conditions well according to some objective or subjective criteria. A forecast has value if it helps the user to make better decisions than without knowledge of the forecast. Forecasts with poor skill can be valuable (e.g. location mismatch) Forecasts with high skill can be of little value (e.g. blue sky desert) Objective of diagnostic/verification tools Assess quality of forecast system i.e. determine skill and value of forecast
Ensemble Prediction System • 1 control run + 50 perturbed runs (TL399 L62) added dimension of ensemble members f(x,y,z,t,e) • How do we deal with added dimension when interpreting, verifying and diagnosing EPS output?
max 75% 25% min EPSgrams Cloud Cover Precipitation median 10m wind 2m Temperature
Ensemble mean • The ensemble mean forecast is the average over all ensemble members Day+6 control Day+6 Ensemble mean • It gives a smoother field than the deterministic forecasts, but the same result can’t be achieved with a simple filtering of a deterministic forecast
Ensemble mean • It gives a smoother field than the deterministic forecasts, but the same result can’t be achieved with a simple filtering of a deterministic forecast Day+6 control(filtered) Day+6 Ensemble mean • If spread is large the EM may be a very weak pattern and may not represent any of the possible evolutions (use measure of ens. spread!)
Probabilistic forecast verification has similarities to deterministic verification Reliability <-> Bias Resolution <-> ACC Brier Score <-> RMS Deterministic vs. Probabilistic use of EPS Use ensemble mean only or explicit use of whole PDF 5 10 15 20 25 5 10 15 20 25
• Test the system for 100 days: 30 x T>25ºC -> 30 x (100 – 20) = 2400 70 x T<25ºC -> 70 x ( 0 – 20) = -1400 +1000 Why Probabilities? • Open air restaurant scenario: open additional tables: £20 extra cost, £100 extra income (if T>25ºC) weather forecast: 30% probability for T>25ºC what would you do? • Employing extra waiter (spending £20) is beneficial when probability for T>25 ºC is greater 20% • The higher/lower the cost loss ratio, the higher/lower probabilities are needed in order to benefit from action on forecast
25 25 25 Reliability Take a sample of probabilistic forecasts: e.g. 30 days x 2200 GP = 66000 forecasts How often was event (T > 25) forecasted with X probability?
• • • • • Reliability Take a sample of probabilistic forecasts: e.g. 30 days x 2200 GP = 66000 forecasts How often was event (T > 25) forecasted with X probability? 100 OBS-Frequency 0 0100 FC-Probability
Reliability Diagram over-confident model perfect model
Reliability Diagram under-confident model perfect model
Reliability diagram Reliability score (the smaller, the better) perfect model imperfect model
Components of the Brier Score N = total number of cases I = number of probability bins ni= number of cases in probability bin i fi = forecast probability in probability bin i oi = frequency of event being observed when forecasted with fi Reliability: forecast probability vs. observed relative frequencies
Reliability diagram Reliability score (the smaller, the better) Resolution score (the bigger, the better) c c Good resolution Poor resolution
Uncertainty: variance of observations frequency in sample Components of the Brier Score N = total number of cases I = number of probability bins ni= number of cases in probability bin i fi = forecast probability in probability bin I oi = frequency of event being observed when forecasted with fi c = frequency of event being observed in whole sample Reliability: forecast probability vs. observed relative frequencies Resolution: ability to issue reliable forecasts close to 0% or 100%
• Brier skill score (BSS) is a measure for skill relative to climatology (p=frequency of the event in the climate sample) Brier Score Brier Score = Reliability – Resolution + Uncertainty • The Brier score is a measure of the accuracy of probability forecasts • p is forecast probability (fraction of members predicting event) • o is observed outcome (1 if event occurs; 0 if event does not occur) • BS varies from 0 (perfect deterministic forecasts) to 1 (perfectly wrong!) • positive (negative) BSS better (worse) than reference
BSS Rel-Sc Res-Sc 0.095 0.926 0.169 0.039 0.899 0.141 0.039 0.899 0.140 -0.001 0.877 0.123 0.047 0.893 0.153 0.065 0.918 0.147 -0.064 0.838 0.099 0.204 0.990 0.213 Reliability: 2m-Temp.>0 1 month lead, start date May, 1980 - 2001 CERFACS CNRM ECMWF INGV LODYC MPI UKMO DEMETER
Brier Skill Score Europe: 850hPa Temperature, D+4
• Ranked Probability Skill Score (RPSS) is a measure for skill relative to a reference forecast Ranked Probability Score • Measures the quadratic distance between forecast and verification probabilities for several categories • It is the average Brier score across the range of the variable • negative / positive RPSS worse / better than reference
• RPS takes into account ordered nature of variable (“extreme errors”) 1 5 10 15 20 25 Brier Score -> Ranked Probability Score • Brier Score used for two category (yes/no) situations (e.g. T > 15oC) 1 5 10 15 20 25
Ranked Probability Skill Score Northern Hemisphere: 500hPa Geopotential
Verification of two category (yes/no) situation • Compute 2 x 2 contingency table: (for a set of cases) • Event Probability: s = (a+c) / n • Probability of a Forecast of occurrence: r = (a+b) / n • Frequency Bias: B = (a+b) / (a+c) • Proportion Correct: PC = (a+d) / n
Example of Finley Tornado Forecasts (1884) • Compute 2 x 2 contingency table: (for a set of cases) • Event Probability: s = (a+c) / n = 51/2803 = 0.018 • Probability of a Forecast of occurrence: r = (a+b) / n = 100/2803 = 0.036 • Frequency Bias: B = (a+b) / (a+c) = 100/51 = 1.961 • Proportion Correct: PC = (a+d) / n = 2708/2803 = 0.966 96.6% Accuracy
Example of Finley Tornado Forecasts (1884) • Compute 2 x 2 contingency table: (for a set of cases) • Event Probability: s = (a+c) / n = 51/2803 = 0.018 • Probability of a Forecast of occurrence: r = (a+b) / n = 0/2803 = 0.0 • Frequency Bias: B = (a+b) / (a+c) = 0/51 = 0.0 • Proportion Correct: PC = (a+d) / n = 2752/2803 = 0.982 98.2% Accuracy!
Ignorance Score: IGN = - 1/n ΣnΣi pn,i,ver ln pn,i,,fc See Roulston & Smith, 2001 Definition of a proper score • Consistency is one of the characteristics of a good forecast • Some scoring rules encourage forecasters to be inconsistent, e.g. some scores give better results when a forecast closer to climatology is issued rather than the actual forecast (e.g. reliability) • Scoring rule is strictly proper when the best scores are obtained if and only if the forecasts correspond with the forecaster’s judgement • Examples of proper scores are the Brier Score or Ignorance Score • n: forecast-verification pairs, i: quantiles • Minimum only when pfc = pver -> proper score • The lower/higher the IGN the better/worse the forecast system
Verification of two category (yes/no) situation • Compute 2 x 2 contingency table: (for a set of cases) • Event Probability: s = (a+c) / n • Probability of a Forecast of occurrence: r = (a+b) / n • Frequency Bias: B = (a+b) / (a+c) • Hit Rate: H = a / (a+c) • False Alarm Rate: F = b / (b+d) • False Alarm Ratio: FAR = b / (a+b)
Example of Finley Tornado Forecasts (1884) • Compute 2 x 2 contingency table: (for a set of cases) • Event Probability: s = (a+c) / n = 0.018 • Probability of a Forecast of occurrence: r = (a+b) / n = 0.036 • Frequency Bias: B = (a+b) / (a+c) = 1.961 • Hit Rate: H = a / (a+c) = 0.549 • False Alarm Rate: F = b / (b+d) = 0.026 • False Alarm Ratio: FAR = b / (a+b) = 0.720
• • • • • • Extension of 2 x 2 contingency table for prob. FC 1 Hit Rate 0 01 False Alarm Rate
A=0.83 • ROC area (area under the ROC curve) is skill measure A=0.5 (no skill), A=1 (perfect deterministic forecast) ROC curve • ROC curve is plot of H against F for range of probability thresholds H low threshold moderate threshold high threshold F
ROCSS vs. BSS • ROCSS or BSS > 0 indicate skilful forecast system Northern Extra-Tropics 500 hPa anomalies > 2σ(spring 2002) ROC skill score Brier skill score Richardson, 2005
Benefits for different users - decision making • A user (or “decision maker”) is sensitive to a specific weather event • The user has a choice of two actions: do nothing and risk a potential loss L if weather event occurs take preventative action at a cost C to protect against loss L • no forecast information: either always take action or never take action • deterministic forecast: act when adverse weather predicted • probability forecast: act when probability of specific event exceeds a certain threshold. This threshold depends on the user • Value V of a forecast savings made by using forecast normalised so that V=1 for perfect forecast, V=0 for forecast no better than climatology • simplest possible case - but shows many important features (see also Richardson, 2000)
• Climate information – expense: • Perfect forecast – expense: • Always use forecast – expense: • Value: Decision making: the cost-loss model Fraction of occurences Potential costs
with: α = C/L H = a/(a+c) F = b/(b+d) o = a+c Northern Extra-Tropics (winter 01/02) D+5 deterministic FC > 1mm precip • For given weather event and FC system: o, H and F are fixed • value depends on C/L • max if: C/L = o • Vmax = H-F Decision making: the cost-loss model
Potential economic value Northern Extra-Tropics (winter 01/02) D+5 FC > 1mm precipitation deterministic EPS p = 0.2 p = 0.5p = 0.8
Potential economic value Northern Extra-Tropics (winter 01/02) FC > 1mm precipitation EPS: each user chooses the most appropriate probability threshold EPS Control Results based on simple cost/loss models have indicated that EPS probabilistic forecasts have a higher value than single deterministic forecasts
Potential economic value Northern Extra-Tropics (winter 01/02) D+5 FC > 20mm precipitation • BSS = 0.06 (measure of overall value for all possible users) • ROCSS = 0.65 (closely linked to Vmax)
Summary • Different ways of incorporating added dimension of EPS (EM vs. PDF) • Ensemble mean is best deterministic forecast EM should be used together with measure of spread • Verification of probability forecast different scores measure different aspects of forecast performance Reliability / Resolution, Brier Score (BSS), RPS (RPSS), ROC,… Perception of usefulness of ensemble may vary with score used It is important to understand the behaviour of different scores and choose appropriately • Potential economic value Decision making is user dependent Cost-Loss model a simple illustration – but shows many useful features
References and further reading • ECMWF newsletter for updates on EPS performance • Jolliffe, I.T. and D.B. Stephenson, 2003: Forecast Verification. A Practitioner’s Guide in Atmospheric Science. Wiley, pp. 240 • Katz, R. W. and A.H. Murphy, 1997: Economic value of weather and climate forecasting. Cambridge University Press, pp. 222. • Palmer, T.N. and R. Hagedorn (editors), 2006: Predictability of weather and climate. Cambridge University Press (available from July 2006) • Richardson, D. S., 2000. Skill and relative economic value of the ECMWF Ensemble Prediction System. Q. J. R. Meteorol. Soc.,126, 649-668. • Roulston, M. S. and L.A. Smith, 2001: Evaluating Probabilistic Forecasts Using Information Theory. Monthly Weather Review,130, 1653-1660. • Wilks, D. S., 2006: Statistical methods in the atmospheric sciences. 2nd ed. Academic Press, pp.627