Advanced Techniques in Data Assimilation

Advanced Techniques in Data Assimilation Richard Ménard Meteorological Service of Canada Department of Atmospheric and Oceanic Sciences, McGill University GCC Summer School, Banff. May 22-28, 2004

Outline • Comparing observations with models • How do we get the error statistics • Covariance modelling • Fitting covariance models with innovation data • 5. Analysis • 6. When and why do we need advanced data assimilation • 7. Analogy between numerical modeling and data assimilation • 8. Kalman filtering • 9. Computational simplification for the Kalman filter • 10. Estimation of Q • 11. Estimation of model error bais • 12. Some thoughts about distinguishing model error bias from • observation error bias • 13. Some remarks and future direction

Some useful references • Daley, R., 1991: Atmospheric Data Analysis. Cambridge University Press. • (very good textbook, but somewhat outdated) • Daley, R., and E. Barker, 2000: NAVDAS Source book 2000. • NRL Publication NRL/PU/7530 – 00 – 418. Naval Research Laboratory, • Monterey, CA, 151 pp. (technical description of an operational 3D-Var • scheme with MPI implementation) • Ghil, M., et al. (ed). 1997: Special Issue of the Second International • Symposium on Assimilation of Observations in Meteorology and • Oceanography. J. Meteorol. Soc. Japan, vol. 75. (Contains a number • of review papers) • Kasibhatla, P. et al. (ed.) Inverse Methods in Global Biogeochemical Cycles. • (comes with a CD-ROM with Matlab exercises) • Course notes of Saroja Polavarapu, University of Toronto • -Course notes of Ricardo Todling “Introduction to Data Assimilation” • DAO Technical Notes

New books Kalnay, E., 2003: Atmospheric Modeling Data Assimilation and Predictability. Cambridge University Press. -Bennett, A., 2002: Inverse modeling of the Ocean and Atmosphere. Cambridge University Press. (computer codes can be obtained from an ftp site)

1. Comparing observations with models In trying to validate a model formulation using observations you have two choices: 1- Compare climate model runs, for which the sensitivity to initial condition has disappeared. Then we compare observation climatology with model climatology. Problem: Even if the first moment and second moment (in practice only the variance) of the model and observation agrees, doesn’t mean that we have instantaneous agreement between observation and model. 2- Attack the problem of specifying the initial condition using data assimilation. Problem: Even if an assimilation run stick close to the truth, how can we disentangle whether the differences between the forecast and analysis is due to initial conditions or modeling error.

obs – model (obs loc) = (true + obs error) - (true + model error) = obs error – model error  distance (km) 2. How do we get the error statistics 2.1 Innovation method Observation errors are spatially uncorrelated Forecast error is the spatially correlated part

Obs and Forecast error variances Mitchell et al. (1990) Mitchell et al. (1990)

If covariances are homogeneous, variances are independent of space Covariances are not homogeneous If correlations are homogeneous, correlation lengths are independent of location Correlations are not homogeneous

Correlations are not isotropic Gustafsson (1981) Daley (1991)

Advantages: Estimates observation error that includes representativeness error. Provide reasonably accurate error statistics at observation locations and for the observed variables. Disadvantages: Error statistics are incomplete to construct Pf because lacking information at all grid points and for all variables In practice: To complete Pf (on the model grid) we fit the forecast error information obtained from innovations to a forecast error correlation model. A good fitting procedure to do this is the maximum likelihood method which gives unbiased estimates of the covariance parameters. The drawback is that most correlation models are homogeneous and isotropic – inhomogeneous correlation models are hard to come by and that the few tricks that have been used to construct then does not provide significantly superior analyses.

2.2 The lagged-forecast method (NMC method) 0 -48 -24 • Compares 24-h and 48-h forecasts valid at same time • Provides global, multivariate corr. with full vertical and spectral resolution • Not used for variances • Assumes forecast differences approximate forecast error Why 24 – 48 ? • 24-h start forecast avoids “spin-up” problems • 24-h period is short enough to claim similarity with 0-6 h forecast error. Final difference is scaled by an empirical factor • 24-h period long enough that the forecasts are dissimilar despite lack of data to update the starting analysis • 0-6 h forecast differences reflect assumptions made in OI background error covariances

Properties (Bouttier 1994) • For linear H, and taking the difference between analyses (0-hour forecast) with 6-h forecast difference, the lagged-forecast method measures the difference between the forecast and analysis error covariances • NMC-method breaks down if there is no data between launch of 2 forecasts. With no data the lagged-forecast variance is zero! • For obs at every gridpoint, and if observation and background error variances are equal, then the lagged-forecast method produces estimates that are half of the the true value.

The error correlation remain close to the forecast error correlation for any ratio as long as the observations are dense Advantage: - Provides global error statistics on all variables for Pf. - Estimates of forecast error can be given in spectral space Disadvantages: - Provides no information on observation error statistics. - Only the information on correlation is reliable provided the the observation network is dense.

In practice: We combine both methods. The innovation method is used to estimate the observation error variance and the forecast error variance for the observed variables. The lagged-forecast method is used to estimate error correlation for all variables on the global domain, and the error variance for the unobserved variables. However, since the observation density can alter the estimated error correlation, the lagged-forecast data is used to fit an homogeneous isotropic correlation model which can be compactly represented in spectral space.

3. Covariance modelling Positive definite matrix (Horn and Johnson 1985, Chap 7) A real n x n symmetric matrix A is positive definite if for any nonzero vector c. A is said to be positive semi-definite if • Properties • The sum of any positive definite matrices of the same size is also • positive definite • Each eigenvalue of a positive definite matrix is a positive real number • The trace and determinant are positive real numbers. Covariance matrix The covariance matrix P of a random vector X = [X1, X2, …, Xn]T is the matrix P = [Pij] in which and E is the mathematical expectation.

Property: A covariance matrix is positive semi-definite Remarks 1 - It is often necessary in data assimilation to invert the covariance matrices, and thus we need to have positive definite covariances 2 – The positive definite property is global property of a matrix, and it is not trivial to obtain

Examples: a) Truncated parabola forn=4 eigenvalues

b) Triangle forn=4 eigenvalues

Covariance functions (Gaspari and Cohn 1998) Definition 1: A function P(r,r’) is a covariance function of a random field X if Definition 2: A covariance function P(r,r’) is a function that defines positive semi-definite matrices when evaluated on any grid. That is, letting ri and rj be any two grid points, the matrix P whose elements are Pi,j = P(ri,rj) is defines a covariance matrix, when P is a covariance function The equivalence between definition 1 and 2 can be found in Wahba(1990, p1-2). The covariance function is also known as the reproducing kernel. Remark Suppose a covariance function is defined in a 3D space, . Restricting the value of r to remain on an manifold (e.g. the surface of a unit sphere) will also define a covariance function, and a covariance matrix (e.g. a covariance matrix on the surface of a sphere)

Correlation functionA correlation function C(r,r’) is a covariance function P(r,r’) normalized by the standard deviation at the points r and r’ Homogeneous and isotropic correlation function If a correlation function is invariant under all translation and all orthogonal transformation, then the correlation function become only a function of the distance between the two points, • Smoothness properties • The continuity at the origin determines the continuity allowed on the rest of the domain. For example, if the first derivative is discontinuous at the origin, then first derivative discontinuity is allowed elsewhere (see example with triangle) • Spectral representation • On a unit circle where  is the angle between the two position vectors, and where and all the Fourier coefficients am are nonnegative • On a unit sphere where all the Legendre coefficients bm are nonnegative.

Examples of correlation functions 1. Spatially uncorrelated model (black) 3. Second-order auto-regressive model (SOAR) (cyan) 2. First-order auto-regressive model (FOAR) (blue) 4. Gaussian model (red) where L is the correlation length scale

r’ q r Exercise: Construct a correlation matrix on a one-dimensional periodic domain Consider a 2D Gaussian model on a plane x’ x Along the circle of radius a Now define a coordinate x along the circle , then we get x’ x

Define a new spatial coordinate Ways to define new covariances by transformation

Linear transformation

Hadamard product Self-convolution • Examples in Gaspari and Cohn (1999) • Hadamard product of an arbitrary • covariance function with a compactly • supported function is very useful in • Ensemble Kalman filtering to create • compactly supported covariances

State dependence • When formulated this way • state-dependent errors can • be introduced in estimation • theory for data assimilation e.g. To create a relative error formulation for observation error, the relative error is scaled by the forecast interpolated at the observation location ( not the observed value! )

4. Fitting covariance models with innovation data: Application of the maximum likelihood method The method of maximum likelihood applies when we estimate a deterministic variable (no error statistics) from noisy measurements. The method works as follows: Suppose that the innovation (O-F) covariance depends on a number of parameters, , such as the error variances, the error correlation length scales, etc …. Assuming that the actual innovations are uncorrelated in time and Gaussian distributed, the conditional probability of the innovations for a given value of the covariance parameters

The value of  which maximizes the probability p(n,…,1|) is the maximum likelihood estimate • In practice, we take the log of p(n,…,1|) that is called the log-likelihood function, L() , and search for its minimum. • The curvature of the log-likelihood function is a measure of the estimation accuracy. A pronounced curvature indicates a reliable estimate, and vice versa.

example: Doppler imager HRDI Coincident measurements in two channels: B band ,  band

5. Analysis Definition: Best estimate (e.g. minimum error variance, maximum a posteriori) of an atmospheric model state using observations at a given time Estimators: • Best Linear Unbiased Estimator Assumes that the estimate is a linear combination of observations and model state Assumes that the observation error is uncorrelated with model state error Doesn’t assume any knowledge of the probability density functions (e.g. Gaussian) Using We get minimizing the analysis error variance

 Bayesian estimators: MV and MAP Analytical expressions for p(x|y) can be found for Gaussian probabilities, Lognormal probabilities, Alpha-stable distributions (generalized Gaussians), Poisson distributions. Analytical expressions may not be obtained with a mixture of p.d.f. - MAP estimator : This estimator is the one used in variational data assimilation schemes, such as 3D-Var and 4D-Var. - MV estimator : min Tr(Pa) This estimator is the one used in Kalman filtering, OI, and Ensemble Kalman filtering.

Properties: 1- reduce the error variance 2- interpolate the observation information to the model grid 3- to filter out the small scale noise Example1: One observation Let Pf be the periodic Gaussian correlation model of the previous example variance correlation x’ x

variance varying the correlation length scale variance - Two observations The reduction of error at one point can be help reduce the error further at neighboring points This is what happens in 3D-Var vs. 1D-Var or retrievals

Analysis of positive quantities • Even if the forecast error correlation is everywhere positive, and that the observed and forecast quantities are positive, there is a possibility that the analysis value (away from the observation location) be negative Example: Consider two grid points located at x1and x2 , and a single observation yo coinciding with grid point x1. The forecast error between the two grid points can be written as where 1 and 2 are the error standard deviation at the grid points 1 and 2, and  is the error correlation, assumed to be positive. If is close to zero, then if the innovation is negative , the analysis at grid point 2 can be negative.

Is using the log x (the logarithm of mixing ratio of concentration) in the analysis equation is a reasonable solution ? The analysis in log’s will produce a best estimate of the logarithm of the concentration but we would have , only if x was not a random variable (or its variance would be very small). If w is Gaussian distributed, then x = exp w is lognormally distributed

Letting w = log x , the analysis equation is then of the form where the B’s are the error covariances defined in log space. b is an observational correction, and has different form whether we consider that it is the measurement in physical space that is unbiased or that it is the measurement in log space that is unbiased. When we consider that it is the measurement of the physical concentration variable that is unbiased, Cohn (1997) shows that Transformation of mean and covariance between log and physical quantities

When the observations are retrievals Retrievals are usually 1D (vertical) analyses, using the radiance as the observation, and a prior xpwith a given error covariance Pp averaging kernel A for the limb sounder HRDI So in a data assimilation scheme we have Problem ! A is usually rank deficient so unless R is full rank, can’t invert Solution: Use SVD and use singular vectors has observations (Joiner and daSilva 1998)

6. When and why do we need advanced data assimilation  When the observation network is non-stationary, e.g. LEO MLS assimilation of ozone using the sequential method

When the data coverage is sparse – to increase information • content with dynamically evolving error covariances Impact of HALOE observations (~30 profiles per day and ½ hour model time step)

 To infer model parameter values Estimates of photolysis rates during assimilation experiment with simulated data: random disturbed photolysis rates (solid, thick) two-sigma bounds due to specified uncertainties (dashed), two-sigma bounds after assimilation of simulated measurements (solid)

 To infer unobserved variables. Here winds increments are developed from ozone observations in a 4D Var scheme, although there is no error correlation between ozone and winds in the background error covariance

7. Data assimilation – Numerical modelling analogy • A data assimilation scheme can be unstable In data sparse areas and with wrong error covariances • The numerical properties of the tangent linear model and adjoint e.g. with an advection scheme that is limited by the courant number, the adjoint model is described by the flux scheme with a Lipchitz condition (creates numerical noise at the poles with a grid point model) • Parametrization of the error covariances Because of the size of the error covariances (e.g. 107 x 107) they cannot be stored so they need to be modelled

Initial forecast error covariance derived from zonal monthly climatology • Retrieval error used as observation error • No model error (assume perfect model)

Prediction Analysis 8. Kalman filtering In a Kalman filter, the error covariance is dynamically evolved between the analyses, so that both the state vector and the error covariance are updated by the dynamics and by the observations. Prediction  Analysis xf,a tn observations Pf,a tn

Definition The Kalman filter produces the best estimate of the atmospheric state given all current and past observations, and yet the algorithm is sequential in time in a form of a predictor-corrector scheme. From a Bayesian point of view the Kalman filter constructs an estimate based on Time sequential property of a Kalman filter is not easy to show Example: Estimating a scalar constant using no prior, and assuming white noise observation error with constant error variance. The MV estimate can be shown to be the simple time averaging and which can be re-written in a sequential scheme

Prediction of errors M is a model of the atmosphere and which expresses our theoretical understanding, based on physical laws of dynamics, radiation, etc … Use the same model to represent the evolution of the true state xt where is called the model error (or modelling error). Linearizing the model about the analysis state, i.e. It is usually assumed that the model error is white, or equivalently that the forecast error can be represented as a first-order Markov process, in which case we can show that

Results from the assimilation of CH4 observations from UARS Menard and Chang (2000) OI and other sub-optimal schemes Kalman filter Forecast error no assimilation observation error observation error analysis error no assimilation OI KF analysis error variance evolving The accumulation of information over time is limited by model error, so that in practice we need to integrate a Kalman filter only over some finite time to obtain the optimal solution

8.2 Asymptotic stability and observability  We say that we have observability when it is possible for a data assimilation system with a perfect model and perfect observations to determine a unique initial state from a finite time sequence of observations. L Example: one-dimensional advection over a periodic domain x continuous observations at a single point x=0 over a time T=L/U determines uniquely the initial condition 0(x) = (x,0) 0 t Theorem. If an assimilation system is observable and the model is imperfect, , then the sequence of forecast error covariance converges geometrically to which is independent of

Advanced Techniques in Data Assimilation

Advanced Techniques in Data Assimilation

Presentation Transcript

Advanced Visualization Techniques for Big Data

Data Assimilation

Developing Assimilation Techniques

Data assimilation in Aladin

Data Assimilation

Advanced Data Assimilation Methods

ECMWF Data Assimilation Training Course - Kalman Filter Techniques

Developments in data assimilation

Theme 2: Modeling, Data Assimilation and Advanced Computing

Advanced data-assimilation methods for satellite observations

Parametrizations in Data Assimilation

Data assimilation

Lightning Assimilation Techniques

Data Assimilation in AMPS

Parametrizations in Data Assimilation

Data assimilation in RTOFS

ECMWF Data Assimilation Training Course - Kalman Filter Techniques

Parameterizations in Data Assimilation

Data Assimilation

Data Assimilation

Lecture 25: Advanced Data Prefetching Techniques

Lightning Assimilation Techniques