What is Data Assimilation? A Tutorial

What is Data Assimilation?A Tutorial Andrew S. Jones Lots of help also from: Steven Fletcher, Laura Fowler, Tarendra Lakhankar, Scott Longmore, Manajit Sengupta, Tom Vonder Haar, Dusanka Zupanski, and Milija Zupanski

Data Assimilation Outline • Why Do Data Assimilation? • Who and What • Important Concepts • Definitions • Brief History • Common System Issues / Challenges

The Purpose of Data Assimilation • Why do data assimilation?

The Purpose of Data Assimilation • Why do data assimilation? (Answer: Common Sense)

The Purpose of Data Assimilation • Why do data assimilation? (Answer: Common Sense) MYTH: “It’s just an engineering tool”

The Purpose of Data Assimilation • Why do data assimilation? (Answer: Common Sense) MYTH: “It’s just an engineering tool” If Truth matters, “It’s our most important science tool”

The Purpose of Data Assimilation • Why do data assimilation? • I want better model initial conditions for better model forecasts

The Purpose of Data Assimilation • Why do data assimilation? • I want better model initial conditions for better model forecasts • I want better calibration and validation (cal/val)

The Purpose of Data Assimilation • Why do data assimilation? • I want better model initial conditions for better model forecasts • I want better calibration and validation (cal/val) • I want better acquisition guidance

The Purpose of Data Assimilation • Why do data assimilation? • I want better model initial conditions for better model forecasts • I want better calibration and validation (cal/val) • I want better acquisitionguidance • I want better scientificunderstandingof • Model errors (and their probability distributions) • Data errors (and their probability distributions) • Combined Model/Data correlations • DA methodologies (minimization, computational optimizations, representation methods, various method approximations) • Physical process interactions

The Purpose of Data Assimilation • Why do data assimilation? • I want better model initial conditions for better model forecasts • I want better calibration and validation (cal/val) • I want better acquisitionguidance • I want better scientificunderstandingof • Model errors (and their probability distributions) • Data errors (and their probability distributions) • Combined Model/Data correlations • DA methodologies (minimization, computational optimizations, representation methods, various method approximations) • Physical process interactions (i.e., sensitivities and feedbacks)Leads toward better future models

The Purpose of Data Assimilation • Why do data assimilation? • I want better model initial conditions for better model forecasts • I want better calibration and validation (cal/val) • I want better acquisitionguidance • I want better scientificunderstandingof • Model errors (and their probability distributions) • Data errors (and their probability distributions) • Combined Model/Data correlations • DA methodologies (minimization, computational optimizations, representation methods, various method approximations) • Physical process interactions(i.e., sensitivities and feedbacks)Leads toward better future models VIRTUOUS CYCLE

The Data Assimilation Community • Who is involved in data assimilation? • NWP Data Assimilation Experts • NWP Modelers • Application and Observation Specialists • Cloud Physicists / PBL Experts / NWP Parameterization Specialists • Physical Scientists (Physical Algorithm Specialists) • Radiative Transfer Specialists • Applied Mathematicians / Control Theory Experts • Computer Scientists • Science Program Management (NWP and Science Disciplines) • Forecasters • Users and Customers

The Data Assimilation Community • What skills are needed by each involved group? • NWP Data Assimilation Experts (DA system methodology) • NWP Modelers (Model + Physics + DA system) • Application and Observation Specialists (Instrument capabilities) • Physical Scientists (Instrument + Physics + DA system) • Radiative Transfer Specialists (Instrument config. specifications) • Applied Mathematicians (Control theory methodology) • Computer Scientists (DA system + OPS time requirements) • Science Program Management (Everything + $$ + Good People) • Forecasters (Everything + OPS time reqs. + Easy/fast access) • Users and Customers (Could be a wide variety of responses)e.g., NWS / Army / USAF / Navy / NASA / NSF / DOE / ECMWF

The Data Assimilation Community • Are you part of this community?

The Data Assimilation Community • Are you part of this community? • Yes, you just may not know it yet.

The Data Assimilation Community • Are you part of this community? • Yes, you just may not know it yet. • Who knows all about data assimilation?

The Data Assimilation Community • Are you part of this community? • Yes, you just may not know it yet. • Who knows all about data assimilation? • No one knows it all, it takes many experts

The Data Assimilation Community • Are you part of this community? • Yes, you just may not know it yet. • Who knows all about data assimilation? • No one knows it all, it takes many experts • How large are these systems?

The Data Assimilation Community • Are you part of this community? • Yes, you just may not know it yet. • Who knows all about data assimilation? • No one knows it all, it takes many experts • How large are these systems? • Typically, the DA systems are “medium”-sized projectsusing software industry standards • Medium = multi-year coding effort by several individuals (e.g., RAMDAS is ~230K lines of code, ~3500 pages of code) • Satellite “processing systems” tend to be larger still • Our CIRA Mesoscale 4DVAR system was built over ~7-8 years with heritage from the ETA 4DVAR system

The Building Blocks of Data Assimilation Control Variables are the initial model state variables that are optimized using the new data information as a guide NWP Model Observation Model Minimization Observations They can also include boundary condition information, model parameters for “tuning”, etc. NWPAdjoint Observation ModelAdjoint

What Are We Minimizing? Minimize discrepancy between model and observation data over time The Cost Function, J, is the link between the observational data and the model variables Observations are either assumed unbiased, or are “debiased” by some adjustment method

Bayes Theorem Maximum Conditional Probability is given by: P (x | y) ~ P (y | x) P (x) Assuming Gaussian distributions… P (y | x) ~ exp {-1/2 [y – H (x)]T R-1 [y – H (x)]} P (x) ~ exp {-1/2 [x –xb]T B-1 [x – xb]} e.g., 3DVAR Lorenc (1986)

What Do We Trust for “Truth”? Minimize discrepancy between model and observation data over time Model Background or Observations?

What Do We Trust for “Truth”? Minimize discrepancy between model and observation data over time Model Background or Observations? Trust = Weightings Just like your financial credit score!

Who are the Candidates for “Truth”? Minimize discrepancy between model and observation data over time Candidate 1: Background Term “x0” is the model state vector at the initial time t0 this is also the “control variable”, the object of the minimization process “xb” is the model background state vector “B” is the background error covariance of the forecast and model errors

Who are the Candidates for “Truth”? Minimize discrepancy between model and observation data over time Candidate 2: Observational Term “y” is the observational vector, e.g., the satellite input data (typically radiances), salinity, sounding profiles “M0,i(x0)” is the model state at the observation time “i” “h” is the observational operator, for example the“forward radiative transfer model” “R” is the observational error covariance matrix that specifies the instrumental noise and data representation errors (currently assumed to be diagonal…)

What Do We Trust for “Truth”? Minimize discrepancy between model and observation data over time Candidate 1: Background Term The default condition for the assimilation when data are not availableor the available data have no significant sensitivity to the model stateor the available data are inaccurate

Model Error Impacts our “Trust” Minimize discrepancy between model and observation data over time Candidate 1: Background Term Model error issues are important Model error varies as a function of the model time Model error “grows” with time Therefore the background term should be trusted moreat the initial stagesof the model run and trusted lessat the end of the model run

How to Adjust for Model Error? Minimize discrepancy between model and observation data over time Candidate 1: Background Term Add a model error term to the cost function so that the weight at that specific model step is appropriately weighted or Use other possible adjustments in the methodology, i.e., “make an assumption” about the model error impacts If model error adjustments or controls are used the DA system is said to be “weakly constrained”

What About Model Error Errors? Minimize discrepancy between model and observation data over time Candidate 1: Background Term Model error adjustments to the weighting can be “wrong” • In particular, most assume some type of linearity • Non-linear physical processes may break these assumptions and be more complexly interrelated A data assimilation system with no model error control is said to be “strongly constrained” (perfect model assumption)

What About other DA Errors? Overlooked Issues? • Data debiasing relative to the DA system “reference”. It is not the “Truth”,however it is self-consistent. • DA Methodology Errors? • Assumptions: Linearization, Gaussianity, Model errors • Representation errors (space and time) • Poorly known background error covariances • Imperfect observational operators • Overly aggressive data “quality control” • Historical emphasis on dynamical impact vs. physical Synoptic vs. Mesoscale?

Add DA Bias Here! DA Theory is Still Maturing Lognormal Variables Clouds Precipitation Water vapor Emissivities Many other hydrologic fields The Future: Lognormal DA (Fletcher and Zupanski, 2006, 2007) Gaussian systems typically force lognormal variables to become Gaussian introducing an avoidabledata assimilation system bias Mode  Mean Many important variables are lognormally distributed Gaussian data assimilation system variables are “Gaussian”

What Do We Trust for “Truth”? Minimize discrepancy between model and observation data over time Candidate 2: Observational Term The non-default condition for the assimilation when • data are available and • data are sensitive to the model state and • data are precise (not necessarily “accurate”) and • data are not thrown away by DA “quality control” methods

What “Truth” Do We Have? Minimize discrepancy between model and observation data over time DATA MODEL CENTRIC CENTRIC TRUTH

DA Theory is Still Maturing A Brief History of DA Hand Interpolation Local polynomial interpolation schemes(e.g., Cressman) Use of “first guess”, i.e., a background Use of an “analysis cycle” to regeneratea new first guess Empirical schemes, e.g., nudging Least squares methods Variational DA (VAR) Sequential DA (KF) Monte Carlo Approx. to Seq. DA (EnsKF)

Variational Techniques Major Flavors: 1DVAR (Z), 3DVAR (X,Y,Z), 4DVAR (X,Y,Z,T) Lorenc (1986) and others… Became the operational scheme in early 1990s to the present day Finds the maximum likelihood (if Gaussian, etc.) (actually it is a minimum variance method) Comes from setting the gradient of the cost function equal to zero Control variable is xa

Sequential Techniques Kalman (1960) and many others… These techniques can evolve the forecast error covariance fields similar in concept to OI B is no longer static, B => Pf = forecast error covariance Pa (ti) is estimated at future times using the model K = “Kalman Gain” (in blue boxes) Extended KF, Pa is found by linearizing the model about the nonlinear trajectory of the model betweenti-1 and ti

Sequential Techniques Ensembles can be used in KF-based sequential DA systems Ensembles are used to estimate Pf through Gaussian “sampling” theory f is a particular forecast instance l is the reference state forecast Pf is estimated at future times using the model K number model runs are required (Q: How to populate the seed perturbations?) Sampling allows for use of approximate solutions Eliminates the need to linearize the model (as inExtended KF) No tangent linear or adjoint models are needed

Sequential Techniques Notes on EnsKF-based sequential DA systems EnsKFs are an approximation Underlying theory is the KF Assumes Gaussian distributions Many ensemble samples are required Can significantly improve Pf Where does H fit in? Is it fully “resolved”? What about the “Filter” aspects? Future Directions Research using Hybrid EnsKF-Var techniques

Sequential Techniques Zupanski (2005): Maximum Likelihood Ensemble Filter (MLEF) Structure function version of Ensemble-based DA (Note: Does not use sampling theory, and is more similar to a variational DA scheme using principle component analysis (PCA) NEis the number of ensembles S is the state-space dimension Each ensemble is carefully selected to represent thedegrees of freedom of the system Square-root filter is built-in to the algorithm assumptions

Where is “M” in all of this? No M used 3DDA Techniques have no explicit model time tendency information, it is all done implicitly with cycling techniques, typically focusing only on the Pf term 4DDA uses Mexplicitly via the model sensitivities, L, and model adjoints, LT,as a function of time Kalman Smoothers (e.g., also 4DEnsKS) would likewise also need to estimate L and LT M used

4DVAR Revisited(for an example see Poster NPOESS P1.16 by Jones et al.) Automatically propagates the Pf within the cycle, however can not save the result for the next analysis cycle (memory of “B” info becomes lost in the next cycle) (Thepaut et al., 1993) LT is the adjoint which is integrated from ti to t0 Adjoints are NOT the “model running in reverse”, but merely the model sensitivities being integrated in reverse order, thus all adjoints appear to function backwards. Think of it as accumulating the “impacts” back toward the initial control variables.

Minimization Process Jacobian of the Cost Function is used in the minimization procedure Minima is at J/ x = 0 Issues: Is it a global minima? Are we converging rapidor slow? J TRUTH x

Ensembles: Flow-dependent forecast error covariance and spread of information from observations grid-point time t2 x From M. Zupanski t1 Isotropic correlations obs2 obs1 Geographically distant observations can bring more information than close-by observations, if in a dynamically significant region t0

Preconditioning the Space From M. Zupanski Result:faster convergence “Preconditioners” transform the variable space so that fewer iterations are required while minimizing the cost function x -> 

Incremental VAR Courtier et al. (1994) Most Common 4D framework in operational use Incremental form performs Linear minimization within a lower dimensional space (the inner loop minimization) Outer loop minimization is at the full model resolution(non-linear physics are added back in this stage) Benefits: Smoothes the cost function and assures better minimization behaviors Reduces the need for explicit preconditioning Issues: Extra linearizations occur It is an approximate form of VAR DA

Types of DA Solution Spaces Model Space (x) Physical Space (y) Ensemble Sub-spacee.g., Maximum Likelihood Ensemble Filter (MLEF) Types of Ensemble Kalman Filters • Perturbed observations (or stochastic) • Square root filters (i.e., analysis perturbations are obtained from the Square root of the Kalman Filter analysis covariance)

How are Data used in Time? Observation model Cloud resolving model forecast time observations Assimilation time window

A “Smoother” Uses All Data Availablein the Assimilation Window(a “Simultaneous” Solution) Observation model Cloud resolving model forecast time observations Assimilation time window

What is Data Assimilation? A Tutorial