Understanding Value of Ambiguity in Forecasting for Better Decision Making

Estimation & Value of Ambiguity in Ensemble Forecasts Tony Eckel National Weather Service Office of Science and Technology, Silver Spring, MD Mark Allen Air Force Weather Agency, Omaha, NE • Eckel, F.A., M.S. Allen, and M.C. Sittel, 2012: Estimation of ambiguity in ensemble forecasts. Weather and Forecasting, in press. •  Allen, M.S. and F.A. Eckel, 2012: Value from ambiguity in ensemble forecasts. Weather and Forecasting, in press.

Part I. Estimation of Ambiguity

Ambiguity Risk: Probability of an unfavorable outcome from occurrence of a specific event. Clear Risk:Probability (or uncertainty) of the event known precisely. EX: Betting at roulette Ambiguous Risk:Probability (or uncertainty) of the event known vaguely. EX: Betting on a horse race Ambiguity -- 2nd order uncertainty, or the uncertainty in a statement of uncertainty “Ambiguity is uncertainty about probability, created by missing information that is relevant and could be known.” -- Camererand Weber (Journal of Risk and Uncertainty, 1992)

Ambiguity in Ensemble Forecasts 2-m Temperature (F) Ensemble Mean & Spread Probability of Freezing @ Surface Spokane, WA Spokane, WA NCEP SREF 27h Fcst, 12Z, 8 Oct 2011 % Ensemble Standard Deviation (F) 35% 44F 25% 36F 15% 28F

Causes of Ambiguity (the “…missing information that is relevant and could be known.”) 2) Random PDF Error from Ensemble Design Limitations EX: Ensemble omits perturbations for soil moisture error 1) Good IC and/or no error sensitivity  Good forecast PDF 2) Bad IC and/or high error sensitivity Underspread forecast PDF Can’t distinguish, so PDF error appears random 1) Random PDF Error from Limited Sampling Ensemble’s Forecast PDF True Forecast PDF ensemble members 0 5 10 15 20 25 Wind Speed (m/s) Ambiguity  Random error in 1st order uncertainty estimate • Not dependent on: • How much 1st order uncertainty exists • Systematic error in 1st order uncertainty estimate, which 1&2 can also produce

STEP 3(Forecast) Use the adjusted members to calculate calibratedpredictions (mean, spread, probability) Shift & Stretch Calibration STEP 1(1st Moment Calibration) Shift each member (ei) over by a shift factor (opposite of mean error in ensemble mean) to correct for bias in PDF location. for i = 1…n members  STEP 2(2nd Moment Calibration) Stretch (compress) the shifted members about their mean, , by a stretch factor(inverse sqrt of variance bias) to correct for small (large) ensemble spread: for i = 1…n members 

Forecast Data: - JMA 51-member Ensemble, 12Z cycle - 5-day, 2-m temperature, 11 over CONUS - Independent: 1 – 31 Jan 2009 - Dependent: 15 Dec 2007 – 15 Feb 2008 Ground Truth: -ECMWF global model analysis (0-h forecast) Raw Conditionally Calibrated 5.0E4 4 3 2 1 0 5.0E4 4 3 2 1 0 BSS = 0.764 (0.759…0.768) rel = 7.89E-4 res = 0.187 unc = 0.245 BSS = 0.768 (0.764…0.772) rel = 4.20E-5 res = 0.188 unc = 0.245 # of Forecasts # of Forecasts Observed Relative Frequency Observed Relative Frequency

Estimating Ambiguity Randomly Calibrated Resampling (RCR) -- based on bootstrap technique 1) From original n members, generate calibrated forecast probability (pe) for an event 2) Produce alternative n members by sampling originals with replacement 3) Apply random calibration (varied by ensemble’s error characteristics) to resampled set to account for ensemble’s insufficient simulation of uncertainty 4) Generate alternative forecast probability for the event 999 Resampling Only RCR CES pe = 31.3% p5 = 23.7% p95 = 37.3% pe = 31.3% pe = 31.3% Frequency p5 = 16.5% p5 = 18.9% p95 = 44.2% p5 = 17.0% p95 = 46.5% Forecast Probability (%) Forecast Probability (%) Forecast Probability (%)

Average error determines primary shift factor and stretch factor Solid: Raw JMA ensemble’ error distributions, from which the error variance associated with random sampling of a 51-member ensemble (see below) is removed. Dashed: Produced a random shift factor and stretch factor to randomly calibration each of the 999 resampled forecast sets. 80 members 40 80 members 40 20 20 10 10 (Standardized Error in Ensemble Mean) (Fractional Error in Ensemble Spread)

Part II. Application of Ambiguity

Application of Ambiguity pe Cost-Loss Decision Scenario Cost (C) – Expense of taking protective action Loss (L) – Expense of unprotected event occurrence Probability (pe) – The risk, or chance of a bad-weather event Take protective action whenever Risk > Risk Tolerance or pe > C/L …since expense of protecting is less than the expected expense of getting caught unprotected, C < Lpe Risk Acceptable C/L pe Decision Unclear C/L Decision Unclear pe C/L Too Risky But given ambiguity in the risk, the appropriate decision can be unclear. Opposing Risk: Fraction of risk that goes against the normative decision. C/L= 0.35 (Risk Tolerance) pe Probability Density 0.0 1.0 Forecast Probability (i.e., Risk)

The Ulterior Motives Experiment GOAL: Maintain primary valuewhile improving 2nd order criteria not considered in primary risk analysis  Event: Freezing surface temperature  ‘User’: Specific risk tolerance level (i.e., C/L value) at a specific location  2nd Order Criterion:Keep users’ trust by reducing repeat false alarms  Forecast Data:GFS Ensemble Forecast,5-day, 2-m temperature, 11 over CONUS - Independent: 12 UTC daily, 1 – 31 Jan 2009 - Dependent: 12 UTC daily, 15 Dec 2007 – 15 Feb 2008 - Ground Truth: ECMWF global model analysis (0-h forecast) Black: Risk clearly exceeds tolerance  Prepare White: Risk clearly acceptable  Do Not Prepare Gray: Decision unclear  PreparationOptional Given potential for repeat false alarm, user may go against normative decision.

8 User Behaviors Behavior Name Behavior Description

The ‘Optimal’ Behavior Testing for C/L = 0.01 Control’s POD Lowest threshold that maintains POD Max. chances to prevent repeat a false alarm Test Value for Threshold of Opposing Risk Ambiguity-Tolerant Backward Optimal Threshold of Opposing Risk (%) Ambiguity-Sensitive User C/L

Measuring Primary Value Value Score (or expense skill score) Efcst= Expense from follow the forecast Eclim= Expense from follow a climatological forecast Eperf= Expense from follow a perfect forecast • a = # of hits • b = # of false alarms • c = # of misses • d = # of correct rejections • = C/L ratio • = (a+c) / (a+b+c+d)

Measuring Primary Value Control – Normative decisions following GFS ensemble calibrated probability forecasts Value Score Deterministic – Normative decisions following GFS calibrated deterministic forecasts User C/L

Losers (w.r.t. primary value) Control Control Value Score Fickle Cynical User C/L User C/L

Marginal Performers (w.r.t. primary value) Control Control Backward Ambiguity-Sensitive Value Score User C/L Control Optimistic Value Score User C/L

Winners (w.r.t. primary value) Control Control Ambiguity-Tolerant Optimal Value Score User C/L User C/L

2nd Order Value Cynical Fickle % Reduction in Repeat False Alarms Ambiguity-Sensitive Backward Optimistic Optimal Ambiguity-Tolerant User C/L

Conclusions • Ambiguity in ensemble forecasts can be effectively estimated • Users can benefit from ambiguity information through improvement of 2nd order criteria, but that requires lots of creativity

Backup Slides

Analysis Model True Forecast PDF (at a specific ) Perfect Perfect Only one possible true state, so true PDF is a delta function. Erred Perfect Each historical analysis match will correspond to a different true initial state, and a different true state at time  . Perfect Erred While each matched analysis corresponds to only one true IC, the subsequent forecast can match many different true states due to grid averaging at  =0 and/or lack of diffeomorphism. Erred Erred Combined effect creates a wider true PDF. Erred model also contributes to analysis error. Perfect  exactly accurate (with infinitely precision) Erred  inaccurate, or accurate but discrete True Forecast PDF True forecast PDF recipefor the current forecast cycle and lead time  1) Look back through an infinite history of forecasts produced by the analysis/forecast system in a stable climate 2) Pick out all instances with the same analysis (and resulting forecast) as the current forecast cycle. Note that each analysis, while the same, represents a different true initial state. 3) Pool all the different verifying true states at  to construct the true distribution of possible states at time  Not absolute -- depends on uncertainty in ICs and model (better analysis/model = sharper true forecast PDF)

(b) ln(2nd Moment Bias Correction) ln(Ensemble Variance) (a) 1st Moment Bias Correction (C) Ensemble Mean (C)

Forecast Probability (pe) by Rank Method =9.0 (the MDT TURB threshold), peapprox. is 6/9 = 66.7% TURB Fcsts (calibrated): 5.86.17.39.29.810.0 10.111.213.8 pe= 6/10 + [ (9.2 – 9.0) / (9.2 –7.3) ] * 1/10 = 61.1% • When has a rank >1 but < n V : verification value  : event threshold n : number of members xi : value of the ith member G( ): Gumbel CDF G’( ): Reversed Gumbel CDF • When has a rank n • When has a rank 1 …or if x is positive definite:

Estimating Ambiguity by CES (Calibrated Error Sampling) Random error (i.e., ambiguity) in pe is tied to random error in any moment of the forecast PDF. A pe error for any value of the event threshold can be found if the true forecast PDF is known. Eckel and Allen, 2011, WAF

Estimating Ambiguity by CES We never know the true forecast PDF, but we do know the range of possibilities of the true PDF based on the ensemble PDF’s error characteristics: Each random draw from the ensemble’s PDF errors generates a unique set of pe errors. Aggregate to form a distribution of pe errors, called an ambiguity PDF. Spread = 2.0C Spread = 6.0C

Estimating Ambiguity by CES Ambiguity PDFs follow the beta distribution.

2nd Order Value o Control Fickle Optimistic Cynical Optimal Ambiguity-Tolerant Ambiguity-Sensitive Backward * * * Cynical Fickle # of Repeat False Alarms % Reduction Ambiguity-Sensitive Optimistic Backward Optimal User C/L User C/L Ambiguity-Tolerant

(a) (b) (c) (d)

Visualization of Ambiguity and Comparison of CES vs. RCR

Understanding Value of Ambiguity in Forecasting for Better Decision Making