310 likes | 534 Views
JEFS Calibration: Bayesian Model Averaging. Eric P. Grimit Clifford F. Mass Jeff Baars University of Washington Atmospheric Sciences. Adrian E. Raftery J. McLean Sloughter Tilmann Gneiting University of Washington Statistics. Research supported by: Office of Naval Research
E N D
JEFS Calibration:Bayesian Model Averaging Eric P. Grimit Clifford F. Mass Jeff Baars University of Washington Atmospheric Sciences Adrian E. Raftery J. McLean Sloughter Tilmann Gneiting University of Washington Statistics Research supported by: Office of Naval Research Multi-Disciplinary University Research Initiative (MURI)
JEFS Technical Meeting; Monterey, CA The General Goal “The general goal in EF [ensemble forecasting] is to produce a probability density function (PDF) for the future state of the atmosphere that is reliable…and sharp…” -- Plan for the Joint Ensemble Forecast System (2nd Draft), Maj. F. Anthony Eckel
JEFS Technical Meeting; Monterey, CA *Verification Rank Histogram Record of where verification fell (i.e., its rank) among the ordered ensemble members: Flat Well-calibrated (truth is indistinguishable from ensemble members) U-shaped Under-dispersive (truth falls outside the ensemble range too often) Humped Over-dispersive Calibration and Sharpness • Calibration ~ reliability (also: statistical consistency) • A probability forecast p, ought to verify with relative frequency p. • The verification ought to be indistinguishable from the forecast ensemble (the verification rank histogram* is uniform). • However, a forecast from climatology is reliable (by definition), so calibration alone is not enough. • Sharpness ~ resolution (also: discrimination, skill) • The variance, or confidence interval, should be as small as possible, subject to calibration.
JEFS Technical Meeting; Monterey, CA Typical Verification Rank Histograms (a) Z500 EOP* 5.0% 4.2% *UWME *UWME+ (b) MSLP 9.0% 6.7% *UWME *UWME+ Synoptic Variable (Errors Depend on Analysis Uncertainty) (c) WS10 Surface/Mesoscale Variable (Errors Depend on Model Uncertainty) 25.6% 13.3% *UWME *UWME+ (d) T2 43.7% 21.0% *UWME *UWME+ *Excessive Outlier Percentage [c.f. Eckel and Mass 2005, Wea. Forecasting]
JEFS Technical Meeting; Monterey, CA Objective and Constraints Objective: Calibrate JEFS (JGE and JME) output. • Utilize available analyses/observations as surrogates for truth. • Employ a method that • accounts for ensemble member construction and relative skill. • Bred-mode / ETKF initial conditions (JGE; equally skillful members) • Multiple models (JGE and JME; differing skill for sets of members) • Multi-scheme diversity within a single model (JME) • is adaptive. • Can be rapidly relocated to any theatre of interest. • Does not require a long history of forecasts and observations. • accommodates regional/local variations within the domain. • Spatial (grid point) dependence of forecast error statistics. • works for any observed variable at any vertical level.
JEFS Technical Meeting; Monterey, CA First Step: Mean Bias Correction • Calibrate the first moment the ensemble mean. • In a multi-model and/or multi-scheme physics ensemble, individual members have unique, often compensatory, systematic errors (biases). • Systematic errors do not represent forecast uncertainty. • Implemented a member-specific bias correction for UWME using a 14-day training period (running mean). • Advantages and disadvantages: • Ensemble spread is reduced (in an under-dispersive system). • The ensemble spread-skill relationship is degraded. (Grimit 2004, Ph.D. dissertation) • Forecast probability skill scores improve. • Excessive outliers are reduced. • Verification rank histograms become quasi-symmetric.
JEFS Technical Meeting; Monterey, CA Second Step: Calibration • Calibrate the higher moments the ensemble variance. • Forecast error climatology • Add the error variance from a long history of forecasts and observations to the current (deterministic) forecast. • For the ensemble mean, we shall call this forecast mean error climatology (MEC). • MEC is time-invariant (a static forecast of uncertainty; a climatology). • MEC is calibrated for large samples, but not very sharp. • Advantages and disadvantages: • Simple. Difficult to beat! • Gaussian. • Not practical for JGE/JME implementation, since a long history is required. • A good baseline for comparison of calibration methods.
CRPS = continuous ranked probability score [Probabilistic analog of the mean absolute error (MAE) for scoring deterministic forecasts] JEFS Technical Meeting; Monterey, CA FIT MEC Mean Error Climatology (MEC) Performance • Comparison of *UWME 48-h 2-m temperature forecasts: • Member-specific mean bias correction applied to both [14-day running mean] • FIT = Gaussian fit to the raw forecast ensemble • MEC = Gaussian fit to the ensemble-mean + the mean error climatology [00 UTC Cycle; October 2002 – March 2004; 361 cases]
JEFS Technical Meeting; Monterey, CA Member-specific mean-bias correction parameters Member-specific BMA weights BMA variance (not-member specific here, but can be) Bayesian Model Averaging (BMA) Bayesian Model Averaging (BMA) Summary • BMA has several advantages over MEC: • A time-varying uncertainty forecast. • A way to keep multi-modality, if it is warranted. • Maximizes information from short (2-4 week) training periods. • Allows for different relative skill between members through the BMA weights (multi-model, multi-scheme physics). [c.f. Raftery et al. 2005, Mon. Wea. Rev.]
JEFS Technical Meeting; Monterey, CA MEC BMA BMA Performance Using Analyses • BMA initially implemented using training data from the entire UWME 12-km domain (Raftery et al. 2005, MWR). • No regional variation of BMA weights, variance parameters. • Used observations as truth. • After several attempts to implement BMA with local or regional training data using NCEP RUC 20-km analyses as truth, we found that: • when the training data is selected from a neighborhood of grid points with similar land-use type and elevation produced EXCELLENT results! • Example application to 48-h 2-m temperature forecasts uses only 14 training days.
JEFS Technical Meeting; Monterey, CA BMA-Neighbor* Calibration and Sharpness calibration Probability integral transform (PIT) histograms an analog of verification rank histograms for continuous forecasts FIT BMA MEC sharpness BMA MEC *neighbors have same land use type and elevation difference < 200 m within a search radius of 3 grid points (60 km)
JEFS Technical Meeting; Monterey, CA BMA-Neighbor* CRPS Improvement BMA improvement over MEC *neighbors have same land use type and elevation difference < 200 m within a search radius of 3 grid points (60 km)
JEFS Technical Meeting; Monterey, CA BMA-Neighbor Using Observations • Use observations, remote if necessary, to train BMA. • Follow the Mass-Wedam procedure for bias correction, to select the BMA training data. • Choose the N closest observing locations to the center of the grid box, which have similar elevation and land-use characteristics. • Find the K occasions during a recent period (up to Kmax days previous), on which the interpolated forecast state was similar to the current interpolated forecast state at each station n = 1, …, N. • Similar ensemble mean forecast states. • Similar min/median/max ensemble forecast states. • If N*K matches are not found, relax the similarity constraints and repeat (1) and (2).
JEFS Technical Meeting; Monterey, CA Summary and the Way Forward • Mean error climatology • Good benchmark to evaluate competing calibration methods. • Generally beats a raw ensemble, even though it is not state-dependent. • The ensemble mean contains most of the information we can use. • The ensemble variance (state-dependent) is generally a poor prediction of uncertainty, at least on the mesoscale. • Bayesian model averaging (BMA) • A calibration method that is becoming popular. (CMC-MSC) • A calibration method that meets many of the constraints that FNMOC and AFWA will face with JEFS. • It accounts for differing relative skill of ensemble members (multi-model, multi-scheme physics). • It is adaptive (short training period). • It can be rapidly relocated to any theatre. • It can be extended to any observed variable at any vertical level (although, research is ongoing on this point).
JEFS Technical Meeting; Monterey, CA Extending BMA to Non-Gaussian Variables • For quantities such as wind speed and precipitation, distributions are not only non-Gaussian, but not purely continuous – there are point masses at zero. • For probabilistic quantitative precipitation forecasts (PQPF): • Model P(Y=0) with a logistic regression. • Model P(Y>0) with a finite Gamma mixture distribution. • Fit Gamma means as a linear regression of the cubed-root of observation on forecast and an indicator function for no precipitation. • Fit Gamma variance parameters and BMA weights by the EM algorithm, with some modifications. [c.f. Sloughter et al. 200x, manuscript in preparation]
JEFS Technical Meeting; Monterey, CA PoP Reliability Diagrams Results for January 1, 2003 through December 31, 2004 24-hour accumulation PoP forecasts, with 25-day training, no regional parameter variations. Ensemble consensus voting as crosses. BMA PQPF model as red dots. [c.f. Sloughter et al. 200x, manuscript in preparation]
JEFS Technical Meeting; Monterey, CA PQPF Rank Histograms Verification Rank Histogram PIT Histogram [c.f. Sloughter et al. 200x, manuscript in preparation]
JEFS Technical Meeting; Monterey, CA Forecast Probability Skill Example Skill vs. Lead Time for FP of the event: WS10 > 18kt Forecast Probability Skill vs. Lead Time The event: 10-m wind speed > 18kt *UWME UWME *UWME+ UWME+ Brier Skill Score (BSS) * Bias-corrected better BSS = 1, perfect BSS < 0, worthless (0000 UTC Cycle; October 2002 – March 2003) Eckel and Mass 2005
JEFS Technical Meeting; Monterey, CA UWME: Multi-Analysis/Forecast Collection Resolution (~@ 45 N ) Objective Abbreviation/Model/Source Type ComputationalDistributed Analysis GFS, Global Forecast System (GFS), Spectral T382 / L64 1.0 / L14 SSI National Centers for Environmental Prediction ~35km ~80km 3D Var CMCG, Global Environmental Multi-scale (GEM), Finite 0.9 / L28 1.25 / L11 4D Var Canadian Meteorological Centre Diff ~70km ~100km ETA, North American Mesoscale limited–area model, Finite 12km / L45 90km / L37 SSI National Centers for Environmental Prediction Diff. 3D Var GASP, Global AnalysiS and Prediction model, Spectral T239 / L29 1.0 / L11 3D Var Australian Bureau of Meteorology ~60km ~80km JMA, Global Spectral Model (GSM), Spectral T213 / L40 1.25 / L134D Var Japan Meteorological Agency ~65km ~100km NGPS, Navy Operational Global Atmos. Pred. Sys. Spectral T239 / L30 1.0 / L14 3D Var Fleet Numerical Meteorological & Oceanographic Cntr. ~60km ~80km TCWB, Global Forecast System, Spectral T79 / L18 1.0 / L11 OI Taiwan Central Weather Bureau ~180km ~80km UKMO, Unified Model, Finite 5/65/9/L30 same / L12 4D Var United Kingdom Meteorological Office Diff. ~60km
PBL / LSM Cumulus Cloud SST Land Use 36-km 12-km vertical shlw. Microphysics Radiation Perturbation Table Soil Domain Domain diffusion cumls. MRF 5-Layer Y Reisner II Kain-Fritsch Kain-Fritsch N CCM2 none default UWME+ GFS+ MRF LSM Y Simple Ice Kain-Fritsch Kain-Fritsch Y RRTM SST_pert01 LANDUSE.plus1 CMCG+ MRF 5-Layer Y Reisner II Grell Grell N cloud SST_pert02 LANDUSE.plus2 ETA+ Eta 5-Layer N Goddard Betts-Miller Grell Y RRTM SST_pert03 LANDUSE.plus3 GASP+ MRF LSM Y Shultz Betts-Miller Kain-Fritsch N RRTM SST_pert04 LANDUSE.plus4 JMA+ Eta LSM N Reisner II Kain-Fritsch Kain-Fritsch Y cloud SST_pert05 LANDUSE.plus5 NGPS+ Blackadar 5-Layer Y Shultz Grell Grell N RRTM SST_pert06 LANDUSE.plus6 TCWB+ Blackadar 5-Layer Y Goddard Betts-Miller Grell Y cloud SST_pert07 LANDUSE.plus7 UKMO+ Eta LSM N Reisner I Kain-Fritsch Kain-Fritsch N cloud SST_pert08 LANDUSE.plus8 Assumed differences between model physics options approximate model error coming from sub-grid scales Perturbed surface boundary parameters according to their suspected uncertainty UWME: MM5 Physics Configuration (January 2005 - current) UWME 1) Albedo 2) Roughness Length 3) Moisture Availability
JEFS Technical Meeting; Monterey, CA 4.0 4.0 3.5 3.5 3.0 3.0 2.5 2.5 2.0 2.0 1.5 1.5 1.0 1.0 0.5 0.5 0.0 0.0 -0.5 -0.5 -1.0 -1.0 -1.5 -1.5 -2.0 -2.0 -2.5 -2.5 GFS+ CMCG+ ETA+ GASP+ JMA+ NGPS+ TCWB+ UKMO+ MEAN+ Member-Wise Forecast Bias Correction UWME+ 2-m Temperature 48h 48h 36h 36h Average RMSE (C) and (shaded) Average Bias 24h 24h 12h 12h (0000 UTC Cycle; October 2002 – March 2003) Eckel and Mass 2005
JEFS Technical Meeting; Monterey, CA Member-Wise Forecast Bias Correction UWME+ 2-m Temperature 14-day running-mean bias correction 48h 36h Average RMSE (C) and (shaded) Average Bias 24h 12h *GFS+ *CMCG+ *ETA+ *GASP+ *JMA+ *NGPS+ *TCWB+ *UKMO+ *MEAN+ (0000 UTC Cycle; October 2002 – March 2003) Eckel and Mass 2005
Sample ensemble forecasts Post-Processing: Probability Densities Q: How should we infer forecast probability density functions from a finite ensemble of forecasts? A: Some options are… • Democratic Voting (DV) • P = x / M x = # members > or < threshold M = # total members • Uniform Ranks (UR)*** • Assume flat rank histograms • Linear interpolation of the DV probabilities between adjacent member forecasts • Extrapolation using a fitted Gumbel (extreme-value) distribution • Parametric Fitting (FIT) • Fit a statistical distribution (e.g., normal) to the member forecasts ***currently operational scheme
JEFS Technical Meeting; Monterey, CA A Concrete Example
JEFS Technical Meeting; Monterey, CA A Concrete Example Minimize Misses Minimize False Alarms
JEFS Technical Meeting; Monterey, CA How to Model Zeroes logit of proportion of rain versus cubed root of bin center
JEFS Technical Meeting; Monterey, CA How to Model Non-Zeroes mean (left) and variance (right) of fitted gammas on each bin
JEFS Technical Meeting; Monterey, CA Power-Transformed Obs Cube root: Untransformed: Square root: Fourth root:
JEFS Technical Meeting; Monterey, CA A Possible Fix • Try a more complicated model, fitting a point mass at zero, an exponential for “drizzle,” and a gamma for true rain around each member forecast Red: no rain, Green: drizzle, Blue: rain