400 likes | 507 Views
Methods of Multi-Model Consolidation, with Emphasis on the Recommended Cross Validation Approach Huug van den Dool CTB seminar, May, 11, 2009. Acknowledgement: Malaquias Pena, Ake Johansson, Wanqiu Wang,Tony Barnston, Suranjana Saha. Traditional Anomaly Correlation. F’ = (F - C obs )
E N D
Methods of Multi-Model Consolidation, with Emphasis on the Recommended Cross Validation ApproachHuug van den DoolCTB seminar, May, 11, 2009 Acknowledgement: Malaquias Pena, Ake Johansson, Wanqiu Wang,Tony Barnston, Suranjana Saha
Traditional Anomaly Correlation F’ = (F - Cobs) A’ = (A - Cobs) Forecast, verifying Analysis, Climatology AC = Σ F’ A’ / (Σ F’F’ Σ A’ A’)1/2 Summation is in space, or in space and time. Weighting may be involved. Cobs is known at the time the forecast is made, i.e. determined from previous data. A (and F obviously) are not part of the sample from which C is calculated Relationship of AC (skill) to MSE . AC is calculated from ‘raw’ data.
New trend due to availability of hindcast data sets: F“ = (F - Cmdl) A’ = (A - Cobs) and, Cobs tends to be calculated from data that matches the model data.
Short-Cut Anomaly Correlation F“ = (F - Cmdl) A’ = (A - Cobs) ACsc = Σ F” A’ / (Σ F”F” Σ A’ A’)1/2 F” = (F - Cmdl) = (F – Cobs) - (Cmdl - Cobs)) F” = F’ - (Cmdl – Cobs) (1) Using F” amounts to a systematic error correction (SEC) , which requires a cross-validation (CV) to be honest. {{ Eq (1) becomes more involved if the periods for Cmdl and Cobs are not the same.}}
Why do we need CV? • To obtain an estimate of skill on future (independent) data. While there is no substitute for real time forecasts on future data, a CV procedure attempts to help us out (without having to wait too long) • Leaving N years out of a sample of M creates N independent data points. Or does it?? • Details of CV procedures used by authors are exceedingly ad-hoc and often wrong • We recommend 3CVRE
Meaning of 3CVRE • Leave 3 years out (3 as a minimum) • R: Leave 3 years out, namely the test year plus two others chosen at Random, see example • E: Use ‘External’ observed climatology, not an observed climatology that changes in response to leaving out a particular set of 3 years.
Example 1981-2001. Three years left out. First year is test year. The other two are picked at random. years left out 1981 1985 1989 years left out 1982 2000 1989 years left out 1983 1990 1998 years left out 1984 1993 1981 years left out 1985 1992 1995 years left out 1986 1999 1987 years left out 1987 1996 1989 years left out 1988 1988 1989 years left out 1989 1983 1992 years left out 1990 1985 2000 years left out 1991 1990 2001 years left out 1992 1996 2001 years left out 1993 1985 1995 years left out 1994 1989 1991 years left out 1995 1986 1996 years left out 1996 1991 1990 years left out 1997 1991 1990 years left out 1998 1991 1988 years left out 1999 2001 1995 years left out 2000 2001 1991 years left out 2001 1998 1999
Why leave three out?, as opposed to just one. Two very different reasons • Anomaly Correlation does not change between ‘raw’ and CV-1-out. (This can be shown analytically) • CV-1-out leads to serious ‘degeneracy’ problems when the forecast involves a regression (as it does for MME with unequal weights) and skill is not that high to begin with (applies unfortunately)
M. Peña Mendez and H. van den Dool, 2008: Consolidation of Multi-Method Forecasts at CPC. J. Climate, 21, 6521–6538. Unger, D., H. van den Dool, E. O’Lenic and D. Collins, 2009: Ensemble Regression. Manuscript Accepted Monthly Weather Review2009 early online release, posted January 2009 DOI: 10.1175/2008MWR2605.1 (1) CTB, (2) why do we need ‘consolidation’?
OFFicial Forecast(element, lead, location, initial month) = a * F1 + b * F2 + c * F3 +…Honest hindcast required 1950-present. Covariance (F1, F2), (F1, F3), (F2, F3), and(F1, A), (F2, A), (F3, A) allows solution for a, b, c (element, lead, location, initial month)
Apply to: • Monthly SST, 1981-2001, 4 starts, leads 1-5 • 9 models • Domain is 20S-20N Pacific Ocean • (gridpoints, not Nino34 index) M. Peña Mendez and H. van den Dool, 2008: Consolidation of Multi-Method Forecasts at CPC. J. Climate, 21, 6521–6538.
Table 1. Some information on the DEMETER-PLUS models * Institutions developing these models: European Center for Medium Range Forecasts, Max Plank-Institute, Meteo-France, United Kingdom Met Office, Instituto Nazionale de Geofisica e Vulcanology, Laboratoire d’Oceanographie Dynamique et de Climatologie, European Centre for Research and Advanced Training in Scientific Computation.
K CON = Σαk SST k k = 1 i.e. a weighted mean over K model estimates One finds the K alphas typically by minimizing the distance between CON and observed SST.
Classic or Unconstrained Regression (UR) The general problem of consolidation consists of finding a vector of weights, α, that minimizes the Sum of Square Errors, SSE, given by the following expression: SSE = (Zα - o)T(Zα - o) (5) Then leads to ZTZα = ZTo So the weights are formally given by α = A-1 b (6) where A = ZTZ is the covariance matrix, and b=Zto . Equation (6) is the solution for the ordinary (Unconstrained) linear Regression (UR).
Why ridge regression?One of the preferred methods that: • Tries minimize damage due to overfit (too many coefficients from too little data) • Tries to handle co-linearity as much as possible • Has a smaller difference in correlation (MSE) for dependent and independent data
Essentially, ridging is a multiple linear regression with an additional penalty term to constrain the size of the squared weights in the minimization of SSE (5): J = (Zα - o)T(Zα - o) + λαTα(7) Minimization of J leads to α = ( A + λ I ) -1b (8) where I is the identity matrix, and , the regularization (or ridging) parameter, indicates the relative weight of the penalty term. Similarities between the ridging and Bayesian approaches for determining the weights have been discussed by Hsiang (1976) and Delsole (2007). In the Bayesian view, (8) represents the posterior mean probability of α, based on a normal a priori parameter distribution with mean zero and variance matrix (σ2/λ)I, where σ2I is the matrix variance of the regression residual, assumed to be normal with a mean zero.
RIW RI RIM Climo UR MMA COR
SEC SEC and CV 3CVRE
Mdl 4 anomaly Obs anomaly year SEC 25.5 .7 26.8 -.4 1981 2.45 25.9 1.1 28.1 .9 1982 2.45 23.8 -.9 27.1 -.1 1983 2.45 23.5 -1.3 26.7 -.5 1984 2.45 24.1 -.7 26.7 -.5 1985 2.45 26.0 1.3 27.4 .2 1986 2.45 26.6 1.9 28.8 1.6 1987 2.45 23.6 -1.1 25.6 -1.6 1988 2.45 26.2 1.5 26.7 -.5 1989 2.45 25.8 1.1 27.3 .1 1990 2.45 23.5 -1.2 27.9 .7 1991 2.45 24.4 -.3 27.5 .4 1992 2.45 24.4 -.3 27.6 .4 1993 2.45 23.5 -1.2 27.3 .1 1994 2.45 22.9 -1.8 27.0 -.2 1995 2.45 25.6 .9 27.1 -.1 1996 2.45 25.8 1.1 28.9 1.7 1997 2.45 23.4 -1.3 25.9 -1.2 1998 2.45 24.5 -.2 26.3 -.8 1999 2.45 25.0 .3 26.7 -.5 2000 2.45 25.2 .4 27.3 .1 2001 2.45 24.7 .0 27.2 .0 all No CV
Mdl 4 anomaly Obs anomaly year SEC 25.5 .9 26.8 -.4 1981 2.62 25.9 1.3 28.1 .9 1982 2.62 23.8 -.9 27.1 -.1 1983 2.46 23.5 -1.3 26.7 -.5 1984 2.44 24.1 -.8 26.7 -.5 1985 2.32 26.0 1.4 27.4 .2 1986 2.56 26.6 2.0 28.8 1.6 1987 2.63 23.6 -.8 25.6 -1.6 1988 2.73 26.2 1.5 26.7 -.5 1989 2.48 25.8 1.1 27.3 .1 1990 2.54 23.5 -1.2 27.9 .7 1991 2.42 24.4 -.3 27.5 .4 1992 2.49 24.4 -.5 27.6 .4 1993 2.32 23.5 -1.3 27.3 .1 1994 2.38 22.9 -1.8 27.0 -.2 1995 2.48 25.6 .9 27.1 -.1 1996 2.45 25.8 1.0 28.9 1.7 1997 2.36 23.4 -1.4 25.9 -1.2 1998 2.37 24.5 -.3 26.3 -.8 1999 2.42 25.0 .2 26.7 -.5 2000 2.41 25.2 .5 27.3 .1 2001 2.50 24.7 .0 27.2 .0 all 3CVRE
Conclusions MME • MMA is an improvement over individual models • It is hard to improve upon an equal weight ensemble average (MMA). Only WestPac SST show some improvement as per ridge regression • This is caused by (very) deficient data set length. We need 5000 years, not 25. • Pooling gridpoints, pooling various start times and leads, throwing out ‘bad’ models upfront and using all ensemble members helps • Equal treatment for very unequal methods is …. • RIW and COR make sense, because this is what CPC does subjectively. • As should have been expected: UR is really bad
ACsc plus CV AC (raw) ACsc
Why leave three out?, as opposed to just one. Two very different reasons • Anomaly Correlation does not change between ‘raw’ and CV-1-out. (This can be shown analytically) • CV-1-out leads to serious ‘degeneracy’ problems when the forecast involves a regression (as it does for MME with unequal weights) and skill is not that high to begin with (applies unfortunately)
Bayesian Multimodel Strategies • Linear regression leads to unstable weights for small sample sizes. • Methods for producing more stable estimates have been proposed by van den Dool and Rukhovets (1994), Kharin and Zwiers (2002), Yun et al. (2003), and Robertson et al. (2004). • These methods are special cases of a Bayesian method, each distinguished by a different set of prior assumptions (DelSole 2007). • Some reasonable prior assumptions: • R:0 Weights centered about 0 and bounded in magnitude • (ridge regression) • R:MM Weights centered about 1/K (K = # models) and bounded in magnitude • R:MM+R Weights centered about an optimal value and bounded in magnitude • R:S2N Models with small S2N (signal-to-noise) ratio tend to have small weights • LS Weights are unconstrained (ordinary least squares) From Jim Kinter (Feb 2009)
If the multimodel strategy is carefully cross validated, then the simple mean beats all other investigated multimodel strategies. • Since Bayesian methods involve additional empirical parameters, proper assessment requires a two-deep cross validation procedure. This can change the conclusion about the efficacy of various Bayesian priors. • Traditional cross validation procedures are biased and incorrectly indicate that Bayesian schemes beat a simple mean. From Jim Kinter (Feb 2009)
Concluding comments CV • CV is done because ……. • Does CV lower skill??? • CV procedures are quite complicated, full of traps. (The price we pay for impatience) • Is there an all-purpose CV approach? • 1-out procedures may be problematic for several reasons • 3CVRE appears appropriate for (our) MME study.