430 likes | 734 Views
An efficient Bayesian inference approach to inverse problems based on an adaptive sparse grid collocation method. Xiang Ma and Nicholas Zabaras. Materials Process Design and Control Laboratory Sibley School of Mechanical and Aerospace Engineering 101 Frank H. T. Rhodes Hall Cornell University
E N D
An efficient Bayesian inference approach to inverse problems based on an adaptive sparse grid collocation method Xiang Ma and Nicholas Zabaras Materials Process Design and Control Laboratory Sibley School of Mechanical and Aerospace Engineering101 Frank H. T. Rhodes Hall Cornell University Ithaca, NY 14853-3801 Email: zabaras@cornell.edu URL: http://mpdc.mae.cornell.edu/
Outline of the presentation • Bayesian approach to inverse problems • Accelerating Bayesian inference via adaptive sparse grid collocation method • Inference of spatial field ---- MRF, Process convolution model, Regularization, e.g. permeability estimation • Conclusions
Inverse problems forward problem: cause(m)-to-effect(d) G ,V system input (unknowns) MELT g SOLID physical processes governed by PDEs observations mathematical representation d = G(m, ωs) + ωm inverse problem: effect(d)–to-cause(m) Forward operators e.g. a system of PDEs; computationally intensive observed data model parameters/inputs , assume random Given a set of data d, estimate m
Applications and significance permeability estimation of a reservoir broad applications Inverse heat transfer microstructure groundwater cleanup pole figure ODF materials process design and control
Bayesian computational approach Advantages: • provides probabilistic description of the inverse solution • direct simulation in deterministic space • prior distribution regularization • --- incorporation of prior knowledge • --- explores the distribution of regularization parameter Obstacle: • Posteriors are often high-dimensional • Computation cost for repeated • likelihood evaluations aim: explore the advantages and resolve this difficulty
Likelihood function L (m) prior density posterior density Normalizing const Bayesian inference • The model m is now a random variable/field. • Apply Bayes’ theorem: • The posterior density is the full Bayesian solution to the inverse problem. Compared to most deterministic inverse techniques • Not just a single value for m, but a probability density which encapsulates all available information about the inverse problem. • Various summarizing statistics, e.g. mean, modes, marginal distribution and credible intervals, can be computed and quantify the uncertainty associated with a specific inverse solution.
Example: Source Inversion Source described by location parameters m, active for . Strength s = 5.0, width τ = 0.1. Data are collected at 25 sensors on a regular 5x5 grid at two times t = 0.05 and t = 0.1. Goal : Given a set of noisy temperature T, try to estimate the unknown source location m Measurement error is additive Gaussian. Priors:
Posterior density • A hierarchical Bayesian model is augmented to detect the noise level σ. • A standard practice to select priors for hyper-parameters is to use conjugate priors. Thus, an inverse Gamma distribution is chosen for σ2. where (α,β) is the parameter pair of the inverse Gamma distribution. • The posterior is non-standard and implicit, and the dimension of the posterior state space may be high. Thus, numerical sampling methods are needed for its exploration. • The most popular numerical method for this purpose is Markov Chain Monte Carlo.
Markov Chain Monte Carlo • In this problem, the posterior density is sampled using the following hybrid of the Metropolis-Hastings algorithm and the Gibbs sampler: • The acceptance ratio is defined as solving forward problem • The full conditional is also an inverse Gamma
Propagate prior uncertainty • In Bayesian problem, everything is assumed random. Thus, the forward problem becomes a stochastic partial differential equation (SPDE). The input uncertainty is the assumed prior density. • Based on the idea of uncertainty quantification, we can propagate the input uncertainty through the forward model and construct a stochastic surrogate model for the forward problem to reduce the computational time • The basic idea is to construct an interpolant in the stochastic prior space using adaptive sparse grid collocation method. Then the interpolant can be used as a surrogate model for the stochastic forward problem.
Mathematical preliminaries • Define a complete probability space with a sample space which corresponds to the outcomes of some experiments, being the σ-algebra of subsets in (these subsets are called events) and the probability measure. • In this framework: Interpreting random variables as functions Stochastic prior state space Random variable M • Define • and their images: MAP Ω • Then the state space of is defined as Real line Each outcome is mapped to a corresponding real value Sample space of elementary events Collection of all possible outcomes with the joint PDF denotes as . • For example, if is an independent uniform random variable in , then image: A general stochastic process is a random field with variations along space and time – A function with domain (Ω, Τ, S)
Stochastic forward problem1 • Without loss of generality, the data d is the solution uto theforward problem at given sensor locations. • We can define a complete probability space , where the sample space is the set of all possible outcomes of . The forward problem is now taken as a stochastic model. Find a solution such that for P-almost everywhere , the following equation holds: for all • The random vector has a state space and a joint PDF which is the prior PDF. Therefore, for , the realization of random vector takes value in . We define as the stochastic prior state space. • The stochastic posterior state space can be also defined as where the event space is the inverse image (under ) of . 1. X. Ma, N. Zabaras, An efficient Bayesian inference approach to inverse problems based on an adaptive sparse grid collocation method, Inverse Problems 25 (2009) 0350013(27pp).
Stochastic forward problem • Since the unknown parameters are always discretized into a random vector , then by using the Doob-Dynkin lemma, u is also a function of i.e. . Also, define D as a d-dimensional physical domain. Therefore, we can restate the deterministic forward problem as the following stochastic forward problem: find a function , such that the following holds: • For each realization , the function value gives one realization of the predicted data, which is equivalent to the solution of the deterministic forward problem using the same m as the input. In this way, the repetitive solution of the deterministic forward problem in MCMC is substituted with the solution to the stochastic forward problem. • When the prior space is the unbounded, we can always truncated it to a bounded one based on some prior information about the unknowns. The only requirement is that the prior space is large enough to contain the posterior space completely.
Sparse grid collocation method1 Need to represent this function Sample the function at a finite set of points Use basis functions to get a approximate representation Function value at any point is simply Stochastic function in 2 dimensions Spatial domain is approximated using a FE, FD, or FV discretization. Stochastic domain is approximated using multidimensional interpolating functions 1. X. Ma, N. Zabaras, A hierarchical adaptive sparse grid collocation method for the solution of stochastic differential equations, Journal of Computational physics, Vol. 228 , pp. 3084-3113, 2009.
Choice of collocation points and nodal basis • In the context of incorporating adaptivity, we have utilized the Newton- Cotes grid using equidistant support nodes. The number of nodes is defined as , if ; else Then the support nodes are • Furthermore, by using the linear hat function as the univariate nodal basis function, one ensures a local support in contrast to the global support of Lagrange polynomials. This ensures that discontinuities in the stochastic space can be resolved. The piecewise linear basis functions can be defined as , for , and for and
Conventional sparse grid collocation (CSGC) • Denote the one dimensional interpolation formula as LET OUR BASIC 1D INTERPOLATION SCHEME BE SUMMARIZED AS • In higher dimension, a simple case is the tensor product formula IN MULTIPLE DIMENSIONS, THIS CAN BE WRITTEN AS • Using the 1D formula, the sparse interpolant , where is the depth of sparse grid interpolation and is the number of stochastic dimensions, is given by the Smolyak algorithm as • Here we define the hierarchical surplus as: 16
Nodal basis vs. hierarchical basis Nodal basis Hierarchical basis
Adaptive sparse grid collocation (ASGC) Let us first revisit the 1D hierarchical interpolation • For smooth functions, the hierarchical surpluses tend to zero as the interpolation level increases. • On the other hand, for non-smooth function, steep gradients/finite discontinuities are indicated by the magnitude of the hierarchical surplus. • The bigger the magnitude is, the stronger the underlying discontinuity is. • Therefore, the hierarchical surplus is a natural candidate for error control and implementation of adaptivity. If the hierarchical surplus is larger than a pre-defined value (threshold), we simply add the 2N neighbor points of the current point.
Adaptive sparse grid collocation: Algorithms • Let be the parameter for the adaptive refinement threshold.
Adaptive sparse grid interpolation • Ability to detect and reconstruct steep gradients
ASGC in Bayesian inference • By using the ASGC method, the stochastic solution to the stochastic forward problem can now be approximated by the following reduced form: • To obtain the predicted data for any , instead of solving the deterministic forward problem, we can simply substitute into above equation to compute the value of within a certain accuracy. Furthermore, we can easily compute the value of the likelihood. • In this case, most of the computational time is spent on constructing the interpolant for the solution. This construction is embarrassingly parallel and thus the computational time is minimal. Once the interpolant is constructed, we can simply store it for future use when new observed data are available. • We only consider the unit hypercube here. Other shapes of the bounded prior space can be easily transformed to the unit hypercube.
Solution to the stochastic forward problem • Return to the 2D source inversion problem. The stochastic forward problem is solved in the prior space by ASGC. • Propagate the uniform prior on source location. First, we need to check the convergence and accuracy of the surrogate model. Surface response of a single component of the stochastic forward problem solution at point (0.5,0.5) and t=0.05 with different ε.
Solution to the stochastic forward problem • The accuracy of the surrogate model can be interrogated in several ways. First we compare the PDF at t = 0.05 and t = 0.1 at point (0.5,0.5) with the direct computation by solving the deterministic forward problem. • Therefore, it is seen that when , the PDF converges to the solution by Direct computation, which verifies the accuracy of the surrogate stochastic forward model.
Solution to the stochastic forward problem • Next, we examine the accuracy of the surrogate likelihood defined as • The exact source location is (0.5, 0.5) with 5% noisy in the data. Kullback-Leibler distance Contours of the exact (solid) and the surrogate likelihood (dashed) Convergence of the surrogate posterior: vs. ε
Solution to the inverse problem • Now we illustrate the usage of the surrogate mode for the Bayesian inference approach on the model inverse problem. • The initial guess for m and σis (0,0) and 1.0. The pair of parameters (α,β) for the inverse Gamma distribution is taken to be (1x10-3, 1x10-3). • The proposal distribution in the Metropolis-Hastings algorithm is a random walker sampler • Unless otherwise specified, the scale parameter is taken to be , and the surrogate by stochastic collocation method is constructed with • The burn-in period of the Markov chain is 10000, and the last 20000 realizations are used to compute relevant statistical quantities.
Trace plots and autocorrelation plot of the Markov Chain • The acceptance rate is 0.314. • Visual inspection suggests the chain mixes very well. • The current choice of scale parameter in the proposal distribution leads to a fast decay with lag along the chain, concurring with good mixing in the trace plot.
Speed up of the surrogate model • The motivation of using this method is to speed up the Bayesian inference for nonlinear problem, where repeated solution of the forward model for each MCMC sample is replaced by evaluation of . Time : 34 hours The direct FEM computation for the likelihood needs 30000 evaluations Time : 84.18sto construct the surrogate model on 20 nodes and 26.9s for a single MCMC chain The number of collocation points (thus the number of direct computation needed) in the sparse grid is 1081.
Marginal density • A distinct feature of MCMC is easy extraction of the marginal distribution for the component of interest via kernel density estimation (KDE). • Although the prior distribution is only uniform, the posterior density of greatly refines the prior distribution. • It is seen that the range of distribution is much larger with 5% noise, whereas it is much more concentrated around the exact value with 1% noise. • From the tables, it can be seen that the current Bayesian hierarchical formulation can successfully detect the noise levels.
Effect of source locations • Besides the great computational savings, another advantage of using this method is that it is reusable when there are new measurements coming after constructing the surrogate model. • The following table shows the results for four other source locations: (0.1,0.9), (0.75,0.25) and two other random draws from the prior with 1% noise in the data. The same surrogate model is used. • It can be seen that no matter where the source location is, the method can always infer the exact value without performing any additional direct FEM computations as in the traditional MCMC method.
Error with different ε and noise level • It is also interesting to compare the results with respect to different ε and noise level. The true source location is chosen at (0.2,0.8). The error is defined as the maximum error between the posterior mean and the true source locations. • Surprisingly, for a given noise level, even a big threshold can give us rather accurate results. This is possibly due to error cancellations between the error of the surrogate model and the measurement error.
Effect of the prior distribution • We next demonstrate that the accuracy of the method is not affected by the prior distribution provided the prior space is large enough to contain the posterior space. • The prior is chosen as independent Gaussian, which is a special form of MRF. The hierarchical PPDF is • It is obvious that the unit square contains the posterior space. So, we truncate the unbounded space to the unit square and we do not need to perform the calculation using ASGC again.
Effect of the size of prior space • We conduct several computations with different sizes of prior space . The surrogate model is constructed with and 5% noise in the data. The true source location is chosen as (0.2, 0.8). • As long as the prior space contains the posterior space, we can always obtain accurate posterior mean estimation. Increasing does not affect the accuracy but it affects the computational cost. The number of collocation points increases with larger and also the burn-in length of the MCMC chain is elongated. Therefore, it is important to choose an appropriate space which balances accuracy and computational cost.
Example : Permeability estimation • In this example, we illustrate the method on the nonlinear inverse problem of estimating the spatial permeability field. The pressure and velocity and characterized by the following set of dimensionless equations: production well where denotes the source/sink term. • To impose the non-negativity of the permeability, from now on we will treat the logarithm of the permeability as the main unknown of our problem. injection well • To generate simulation data, we consider smooth permeability of the following form Pressure is measured at a 5x5 sensor network and 5% noise is added.
MRF and process convolution model • In the general Bayesian setting, the unknown permeability field and the forward model must be discretized. If the FE method is used, the field is discretized onto the nodes of the FE mesh. Denote by the number of nodes of the mesh; then we can write both the prior and posterior densities in terms of the unknown parameter • Possible choices for the prior of is the Gaussian process (GP) and the Markov Random Field. A Gaussian MRF is of the form Number of neighbors of site if and are adjacent. • However, it is noted that the dimensionality of is generally very high since is above 1000 in a typical finite element mesh. Exploring the posterior state space using MCMC is not trivial and the acceptance rate is generally very low. To this end, we use the process convolution approach to reduce the dimensionality.
MRF and process convolution model • The GP at any location can be represented as The kernel function is chosen to be a mean zero Gaussian kernel at a discrete set of points which controls the spatial structure of the underlying process and are independent • Thus the prior density of is • The inverse problem has now been transformed to an inference problem on the coefficients • If the locations of the kernels are at a regular grid, a MRF can also be used for the prior distribution of the underlying process on this grid. • The standard deviation of the kernel is usually equal to the distance between adjacent locations of the kernels and the lattice grid of the kernel locations is usually larger than the space of the observed data in order to eliminate inaccurate results near the boundaries.
Permeability estimation: fixed kernel locations • If the unknown is also denoted as , then the hierarchical PPDF can be computed as follows: • We choose a 5 x 5 lattice in as kernel locations to construct the discrete process convolution model which corresponds to a total of 25 kernels, where the Gaussian kernel is used. MRF is used as the prior and the stochastic prior space is chosen. • A 25-dimensional ASGC with is used, which has 150 322 collocation points. However, the corresponding number of collocation points for a conventional sparse grid is 199 876 961. Thus, the advantage of using ASGC is obvious.
Permeability estimation: fixed kernel locations 5% noise in the data True Posterior mean Marginal densities
Permeability estimation: fixed kernel locations 1% noise in the data 9 x 9 sensor grid 5 x 5 sensor grid Marginal of each kernel strengths 5% noise 1% noise
Random kernel locations • are independent Gaussian random variables. Each kernel has three degrees of freedom, i.e. and x, y locations. Posterior mean: 4 kernels Posterior mean: 5 kernels
Conclusions • An efficient Bayesian approach is introduced based on adaptive sparse grid collocation method. • Efficient propagation of the prior uncertainty through the forward model • Inference of spatial fields, with dimensionality reduction using MRF process convolution model • The accuracy of the methods is assessed on two non-linear inverse problems. • The speed up of this method comparing with that of direct computation is drastic. • However, there are also some limitations of the method. ASGC relies exclusively on a bounded input support. Therefore, it is crucial to choose appropriate and . As long as the prior space includes the support of the posterior, the method provides very good results.
References 1. X. Ma, N. Zabaras, An efficient Bayesian inference approach to inverse problems based on an adaptive sparse grid collocation method, Inverse Problems 25 (2009) 0350013(27pp). 2. X. Ma, N. Zabaras, A hierarchical adaptive sparse grid collocation method for the solution of stochastic differential equations, Journal of Computational physics, Vol. 228 , pp. 3084-3113, 2009.