340 likes | 355 Views
This study explores the benefits of using multiple models for 2m temperature predictions, combining models of varying complexity to enhance forecast accuracy. The ensembling methods discussed include raw model probabilities, recalibrated PDF probabilities, and different approaches like pooled model ensemble and performance-based model ensemble. Various techniques such as Bayesian analysis, multiple linear regression, and canonical variate analysis are employed to refine the model ensembles and improve forecast uncertainty. The research analyzes the impact of different combination methods on prediction reliability and resolution, emphasizing the advantages of recalibrating individual models. Conclusions highlight the reliability of ensembling multiple models and the importance of choosing the right approach based on forecast requirements.
E N D
Multi-Model Ensembling for Seasonal-to-Interannual Prediction:From Simple to Complex Lisa Goddard and Simon MasonInternational Research Institute for Climate & SocietyThe Earth Institute of Columbia University
Benefit of Using Multiple Models RPSS for 2m Temperature (JFM 1950-1995) Combining models reduces deficiencies of individual models Benefit of Combining Models
Varying Complexity in Building MM Refining: (1) RAW MODEL PROBABILITIES (simple) Tercile thresholds determined by model history -- Counting (2) RECALIBRATED PDF PROBABILITIES (less simple) - Contingency table recalibration (CT): categorical probabilities determined by category of ensemble mean - Uncertainty in forecast PDFs based on ensemble mean MSE
Varying Complexity in Building MM Combining: (1) POOLED MM ENSEMBLE (simple) Each model weighted equally (2) PERFORMANCE-BASED MM ENSEMBLE (less simple) - Bayesian: determine optimal weights for AGCMs & climatology by maximizing likelihood -Multiple linear regression (MLR): obtain probabilities from prediction error variance using first few moments of ensemble distributions - Canonical Variate (CV): maximize discrimination between categories using first few moments of ensemble distributions
t=1 2 3 4 Climo Fcst“Prior” GCM Fcst“Evidence” Combining: Based on model performance Bayesian Model Combination … … … Combine “prior” and “evidence” to produce weighted “posterior” forecast probabilities, by maximizing the likelihood.
Canonical Variate Analysis Combining: Based on obs. relative to model The canonical variates are defined to maximize the ratio of the between-category to within-categoryvariance.
Equal Weighting Probabilistic forecasts were obtained by counting the number of ensemble members beyond the outer quartiles, and then averaging across the three models. The pooled ensemble is thus an equally-weighted combination of predictions uncorrected for model skill (although corrected for model drift). Reliability is good for all three categories.
Data AGCMs: Simulations for 1950-2000 * CCM3 (NCAR) – 24 runs * ECHAM4.5 (MPI) – 24 runs * ECPC (Scripps) – 10 runs * GFDL AM2p12b – 10 runs * NCEP-MRF9 (NCEP/QDNR) – 10 runs * NSIPP1 (NASA-GSFC) – 9 runs Observations: 2m Air Temperature and Precipitationfrom CRU-UEA (v2.0)
Effect of Probability TreatmentJFM 2m air temperature over land
Conclusions II • Reliability of N models pooled together, with uncalibrated PDFs, is better than any individual AGCM. • Gaussian (PDF) recalibration gives some improvement, but Bayesian recalibration gives the greatest benefit. • Reliability is typically gained at the expense of resolution.
ISSUES • Number of Models • Length of training period • When simple is complex enough?
Effect of # of Models3 vs 6 AGCMS; 45-year training period Different approaches are more similarwith more models. (Robertson et al., 2004, MWR)
RPSS for 2m TemperatureBayesian MM from Raw Probs. – 6 models, 45-yr training Jan-Feb-Mar Jul-Aug-Sep
RPSS for PrecipitationBayesian & Pooled MM from Raw Probs. – 6 models, 45-yr training Jan-Feb-Mar Jul-Aug-Sep
Reliability Diagrams* several methodsyield similar results overthe United States. * MMs are remarkablyreliable over the US,even though the accuracyis not high.
CONCLUSIONS III • MM simulations over the US are remarkably reliable, even if their not terribly accurate. • Simple pooling of the AGCMs, with uncalibrated probabilities, is equivalent to any of our techniques over the U.S. • Doesn’t require long history, but largenumber of models (>5?) is desirable.
GRAND CONCLUSIONS • Overall, we find that recalibrating individual models gives better results than putting models together in complex combination alorithm. • In comparing different recalibration/combination methods, we find that generally a gain in reliability is countered with a loss in resolution. • More complicated approaches are not necessarily better. This needs to investigated for different forecast situations (i.e. variables, region, season).
Ranked Probability Skill Scores Temperature Jan-Feb-Mar (1950-1995)
Ranked Probability Skill Scores Precipitation Jul-Aug-Sep (1950-1999) • Comparing treatment of probability • - Even with 6 models, have regionsof large negative RPSS Suggests common model errors • Recalibration reduces, but does noteliminate, large errors • Some improvement of positive skill Recal-Raw
Ranked Probability Skill Scores Precipitation Jul-Aug-Sep (1950-1999) • Comparing combination methods • Performance-based combination eliminates large errors • More improvement of positive skill • More cases of negative skill turned topositive skill
Canonical Variate Analysis • A number of statistical techniques involve calculating linear combinations (weighted sums) of variables. The weights are defined to achieve specific objectives: • PCA – weighted sums maximize variance • CCA – weighted sums maximize correlation • CVA – weighted sums maximize discrimination