Model dependence and an idea for post-processing multi-model ensembles

Model dependence and an idea for post-processing multi-model ensembles Craig H. Bishop Naval Research Laboratory, Monterey, CA, USA Gab Abramowitz Climate Change Research Centre, UNSW, Australia

Outline • What model independence is and why it matters • Error covariances and model dependence • The replicate Earth conceptual framework for allowing chaotic climate behaviour in model evaluation • A replicate Earth ensemble transformation that improves the relationship between the frequency of modelled events and the likelihood of their actual occurrence

Error covariances and model dependence • At any location and time step, let y be the observed value and xk be the (bias corrected) modelled value from the kth model. • Now find the minimum error variance linear combination of models:

Error covariances and model dependence

How do error correlations affect multi-model mean?

How does error correlation affect multi-model mean? • Error of multi-model mean decreases rapidly with decreasing error correlation r • When inter-model error correlation is zero , error variance of multi-model mean is 1/K of the error variance of a single model. • When errors are perfectly anti-correlated (r = -1), error variance of multi-model mean is zero. Mean square error of the optimal linear combination of ensemble members as a function of the error correlation parameter r for an idealized 5 member ensemble having an error covariance matrix given by Equation (8).

Weighting for model dependence - results • HadCRUT3 5°×5°observed monthly surface temperature 1970-1999 • 24 CMIP3 models interpolated to 5°×5° • White grid cells => >20% missing obs data

Weighting for model dependence - results • 30 out-of-sample tests: 29 years to define weights, 1 to test – repeated for all 30 possible testing years • 24 CMIP3 models • HadCRUT3 5°×5°observed monthly surface temperature 1970-1999 • Performance weights obtained by assuming that Apply weights globally Apply weights at each grid cell Multi-model mean

Error correlation, independence and determinism • If zero correlation we have performance weighting: • We’d expect zero correlation for independent random variables, but do we really want this for independent climate models? • This would suggest a perfect, independent model would reproduce the observed data + noise: • Observed data would always be at the centre of the distribution of an ensemble of perfect models • “truth+error” paradigm => completely deterministic, predictable climate • Can we be sure that climate is predictable? Do we need to be sure that it is?

Climatic Probability Density Function (CPDF) • CPDFs give probability of observing ranges of values of temperature, wind, rain or even particular environmental phenomena – such as tropical cyclones or floods. • The instantaneous CPDF gives the probability of outcomes of system at a point in time • Any observation of our Earth is a single random draw from such a CPDF • If the climate system were static, the CPDF would be well approximated by historical data • Climate forcing green-house gases rapidly increasing • It is impossible to empirically determine the CPDF during climate change • Climate change impact assessment requires the CPDF, not just the mean

The Replicate Earth Paradigm Imagine a very large number of Earth replicates that experience immeasurably similar orbital / solar / GHG forcing Each Earth has a different atmosphere / ocean state as a result of chaotic processes Behaviour across replicate Earths defines the CPDF in presence of climate change; e.g. frequency of weather categories Climate models can be viewed as attempts to create replicate Earths conditioned on the observations used for model development and initialization A perfect model’s prediction would be a random draw from the CPDF of replicate Earths

The Replicate Earth paradigm Global mean surface temperature over the last century, expressed as an anomaly. An ensemble of climate models is represented by the yellow lines, their multi-model mean in red, and the observational record in black (originally Figure TS.23.a in IPCC AR4 Working Group 1 Technical Summary).

“Truth+error” vs “Replicate Earth” paradigms • “Truth+error” => perfect model is obs + noise; multi-model mean of an ensemble of perfect models should converge to zero error as number of models increases • Assumption that all processes are predictable • “Replicate Earth” => perfect model is drawn from the same distribution as observations; multi-model mean of ensemble of perfect models will converge to mean of CPDF To what extent do climate models approximate replicate Earths?

Two key properties of replicate Earths • Mean of the distribution of replicate Earths (blue line) is the linear combination of replicate Earths that minimises distance from our Earth’s observations. • Time average of the variance of replicate Earths is approximately equal to the mean square error of the observations about the CPDF mean (i.e. MSE of replicate Earth mean w.r.t observations)

CMIP3 climate models do not look like replicate Earths Biases and shared differences in response to, e.g., greenhouse gases, mean models are unlike replicate Earths. We can show: • The mean of the AR4 ensemble is not the minimum error variance estimate

CMIP3 climate models do not look like replicate Earths Biases and shared differences in response to, e.g., greenhouse gases, mean models are unlike replicate Earths. We can show: • The mean of the AR4 ensemble, is not the minimum error variance estimate, and • The time average of the variance of the AR4 ensemble is not equal to the mean square error of the mean of the AR4 ensemble [nor is it equal to the error variance of the true minimum error variance estimate].

Ensemble Transformation Ensemble created by sampling with frequency is like a replicate Earth ensemble in that (a) its sample mean is the minimum error variance estimate, and (b) its variance equals the error variance of the sample mean. • Find linear combination of models that minimizes error variance • Coefficients not necessarily positive • Rescale weights and models’ time series so that: • Weights are positive (interpret a weight as probability model is an Earth replicate) • Weighted mean gives the minimum error variance estimate in 1. • The time average of the CPDF variance estimate equals the MSE of the weighted mean Transformations require solving for parameters α and β:

Rank Frequency Histograms for M=6 At each grid cell and time, take n samples from a uniform distribution [0,1] to select an n-model ensemble Vary size of ensemble to achieve flat histograms 0 1 Raw and bias corrected ensembles were found not to give reliable probabilistic forecasts for any ensemble size The most accurate forecast (orange line) – which is based on differing weights for each grid cell – was found to give approximately reliable probabilistic forecasts for M=6 and M=5. For smaller values of M , the extreme ranks were under-populated by the verifying observation. For larger values of M, the extreme ranks were under-populated. Does this mean the effective local ensemble size is about 5.5? The ensemble forecast based on one set of global weights gave its flattest RFH for M=9. Flat line (zero slope) indicates that ensemble frequencies give reliable probabilistic forecasts

Rank Frequency Histograms

Conclusions • Model dependence crucial to climate prediction accuracy • Dependence of zero error correlation implies climate is entirely predictable • Replicate Earth paradigm: • Observations not at the centre of the distribution • Perfect, independent model drawn from the same distribution as observations • CPDF is formally unobservable in the presence of a changing climate – models provide the only estimate of the replicate Earth ensemble • Provides a framework for understanding role of chaos in climate prediction • Ensemble post-processing to give replicate-Earth-like models: • Marked reduction in RMSE of prediction • Flatter rank frequency histograms – needed to climate change impacts assessments • Future work: Use framework to make climate predictions: • Following suggestion from Kevin Bowman (JPL) perhaps focus on error measure more directly related to climate sensitivity: OLR, difference between Spring and Autumn snow cover, any suggestions?

The effective number of climate models Case 1: truth + error paradigm: • Consider 24 independent random time series with mean = 0 and variance = 1 : • Mean of time series will have variance 1/24 • Now consider 24 time series of model error • Zero correlation (‘independent’) => multi-model mean error variance should drop to ~ 1/24 of average ensemble member error variance • Ratio of actual error variance to this expected value gives effective number of models

The effective number of climate models Case 2: replicate Earth paradigm, based on perturbed weights: • If models were replicate Earths, minimum error variance linear combination would be obtained using weights = 1/24 • Model dependence => significant error correlations, this is evident in weights: • Some weights nearly zero • Some weights > 1/24

Rank frequency histogram (out of sample) How can we tell if the transformation works? For a single grid cell, at a single time step, what is the rank of the observed value in the observed + model set? Perturbed models behave more like replicate Earths than raw or bias corrected models (flatter histogram)

Model dependence and an idea for post-processing multi-model ensembles