1.08k likes | 1.49k Views
An adaptive hierarchical sparse grid collocation algorithm for the solution of stochastic differential equations. Nicholas Zabaras and Xiang Ma. Materials Process Design and Control Laboratory Sibley School of Mechanical and Aerospace Engineering 101 Frank H. T. Rhodes Hall Cornell University
E N D
An adaptive hierarchical sparse grid collocation algorithm for the solution of stochastic differential equations Nicholas Zabaras and Xiang Ma Materials Process Design and Control Laboratory Sibley School of Mechanical and Aerospace Engineering101 Frank H. T. Rhodes Hall Cornell University Ithaca, NY 14853-3801 Email: zabaras@cornell.edu URL: http://mpdc.mae.cornell.edu/ Symposium on `Numerical Methods for Stochastic Computation and Uncertainty Quantification' in the 2008 SIAM Annual Meeting, San Diego CA, July 7-11, 2008
Outline of the presentation • Stochastic analysis • Issues with gPC and ME-gPc • Collocation based approach • Basics of stochastic collocation • Issues with existing stochastic sparse grid collocation method (Smolyak algorithm) • Adaptive sparse grid collocation method • Some numerical examples
Engineering component Process Control? Heterogeneous random Microstructural features Motivation All physical systems have an inherent associated randomness • SOURCES OF UNCERTAINTIES • Multiscale material information – inherently statistical in nature. • Uncertainties in process conditions • Input data • Model formulation – approximations, assumptions. Why uncertainty modeling ? Assess product and process reliability. Estimate confidence level in model predictions. Identify relative sources of randomness. Provide robust design solutions.
Problem definition • Define a complete probability space . We are interested to find a stochastic function such that for P-almost everywhere (a.e.) , the following equation hold: where are the coordinates in , L is (linear/nonlinear) differential operator, and B is a boundary operators. • In the most general case, the operator L and B as well as the driving terms f and g, can be assumed random. • In general, we require an infinite number of random variables to completely characterize a stochastic process. This poses a numerical challenge in modeling uncertainty in physical quantities that have spatio-temporal variations, hence necessitating the need for a reduced-order representation.
Representing Randomness • Interpreting random variables • Distribution of the random variable • E.g. inlet velocity, inlet temperature Interpreting random variables as functions Random variable x MAP S Real line Each outcome is mapped to a corresponding real value Sample space of elementary events Collection of all possible outcomes 3. Correlated data Ex. Presence of impurities, porosity Usually represented with a correlation function We specifically concentrate on this. A general stochastic process is a random field with variations along space and time – A function with domain (Ω, Τ, S)
Representing Randomness • Representation of random process • - Karhunen-Loeve, Polynomial Chaos expansions • 2. Infinite dimensions to finite dimensions • - depends on the covariance Karhunen-Loèvè expansion Based on the spectral decomposition of the covariance kernel of the stochastic process Random process Mean Set of random variables to be found • Need to know covariance • Converges uniformly to any second order process Eigenpairs of covariance kernel Set the number of stochastic dimensions, N Dependence of variables Pose the (N+d) dimensional problem
Karhunen-Loeve expansion ON random variables Mean function Stochastic process Deterministic functions • Deterministic functions ~ eigen-values, eigenvectors of the covariance function • Orthonormal random variables ~ type of stochastic process • In practice, we truncate (KL) to first N terms
The finite-dimensional noise assumption • By using the Doob-Dynkin lemma, the solution of the problem can be described by the same set of random variables, i.e. • So the original problem can be restated as: Find the stochastic function such that • In this work, we assume that are independent random variables with probability density function . Let be the image of . Then is the joint probability density of with support
Uncertainty analysis techniques • Monte-Carlo : Simple to implement, computationally expensive • Perturbation, Neumann expansions : Limited to small fluctuations, tedious for higher order statistics • Sensitivity analysis, method of moments : Probabilistic information is indirect, small fluctuations • Spectral stochastic uncertainty representation: Basis in probability and functional analysis, can address second order stochastic processes, can handle large fluctuations, derivations are general • Stochastic collocation: Results in decoupled equations
Spectral stochastic presentation • A stochastic process = spatially, temporally varying random function CHOOSE APPROPRIATE BASIS FOR THE PROBABILITY SPACE GENERALIZED POLYNOMIAL CHAOS EXPANSION HYPERGEOMETRIC ASKEY POLYNOMIALS SUPPORT-SPACE REPRESENTATION PIECEWISE POLYNOMIALS (FE TYPE) KARHUNEN-LOÈVE EXPANSION SPECTRAL DECOMPOSITION SMOLYAK QUADRATURE, CUBATURE, LH COLLOCATION, MC (DELTA FUNCTIONS)
Generalized polynomial chaos (gPC) • Generalized polynomial chaos expansion is used to represent the stochastic output in terms of the input Stochastic input Askey polynomials in input Stochastic output Deterministic functions • Askey polynomials ~ type of input stochastic process • Usually, Hermite, Legendre, Jacobi etc.
Generalized polynomial chaos (gPC) • Advantages: • Fast convergence1. • Very successful in solving various problems in solid and fluid mechanics – well understood 2,3. • Applicable to most random distributions 4,5. • Theoretical basis easily extendable to other phenomena • P Spanos and R Ghanem, Stochastic finite elements: A spectral approach, Springer-Verlag (1991) • D. Xiu and G. Karniadakis, Modeling uncertainty in flow simulations via generalized polynomial chaos, Comput. Methods Appl. Mech. Engrg. 187 (2003) 137--167. • S. Acharjee and N. Zabaras, Uncertainty propagation in finite deformations -- A spectral stochastic Lagrangian approach, Comput. Methods Appl. Mech. Engrg. 195 (2006) 2289--2312. • D. Xiu and G. Karniadakis, The Wiener-Askey polynomial chaos for stochastic differential equations, SIAM J. Sci. Comp. 24 (2002) 619-644. • X. Wan and G.E. Karniadakis, Beyond Wiener-Askey expansions: Handling arbitrary PDFs, SIAM J Sci Comp 28(3) (2006) 455-464.
Generalized polynomial chaos (gPC) • Disadvantages: • Coupled nature of the formulation makes implementation nontrivial1. • Substantial effort into coding the stochastics: Intrusive • As the order of expansion (p) is increased, the number of unknowns (P) increases factorially with the stochastic dimension (N): combinatorial explosion2,3. • Furthermore, the curse of dimensionality makes this approach particularly difficult for large stochastic dimensions • B. Ganapathysubramaniana and N. Zabaras, Sparse grid collocation schemes for stochastic natural convection problems, J. Comp Physics, 2007, doi:10.1016/j.jcp.2006.12.014 • D Xiu, D Lucor, C.-H. Su, and G Karniadakis, Performance evaluation of generalized polynomial chaos, P.M.A. Sloot et al. (Eds.): ICCS 2003, LNCS 2660 (2003) 346-354 • B. J. Debusschere, H. N. Najm, P. P. Pebray, O. M. Knio, R. G. Ghanem and O. P. Le Maître, Numerical Challenges in the Use of Polynomial Chaos Representations for Stochastic Processes, SIAM Journal of Scientific Computing 26(2) (2004) 698-719.
Failure of gPC • Gibbs phenomenon: Due to its global support, it fails to capture the stochastic discontinuity in the stochastic space. Capture unsteady equilibrium in natural convection X-velocity Y-velocity • Therefore, we need some method which can resolve the discontinuity locally. This motivates the following Multi - element gPC method.
Multi-element generalized polynomial chaos (ME-gPC) • Basic concept: • Decompose the space of random inputs into small elements. • In each element of the support space, we generate a new random variable and apply gPCE. • The global PDF is also simultaneously discretized. Thus, the global PC basis is not orthogonal with the local PDF in a random element. We need to reconstruct the orthogonal basis numerically in each random element. The only exception is the uniform distribution whose PDF is a constant. • We can also decompose the random space adaptively to increase its efficiency. • X. Wan and G.E. Karniadakis, An adaptive multi-element generalized polynomial chaos method for stochastic differential equations, SIAM J Sci Comp 209 (2006) 901-928.
Decomposition of the random space • We define a decomposition of as where • Based on the decomposition , we define the following indicator random variables: otherwise • Then a local random vector is defined in each element subject to a conditional PDF where
An adaptive procedure • Let us assume that the gPC expansion of random field in element is • From the orthogonality of gPC, thelocal variance approximated by PC with order is given by • The approximate global mean and variance can be expressed by • Let denote the error of local variance. We can write the exact global variance as
An adaptive procedure • We define the local decay rate of relative error of the gPC approximation in each element as follows: • Based on and the scaled parameter , a random element will be selected for potential refinement when the following condition is satisfied where and are prescribed constants. • Furthermore, in the selected random element , we use another threshold parameter to choose the most sensitive random dimension. We define the sensitivity of each random dimension as where we drop the subscript for clarity and the subscript denotes the mode consisting only of random dimension with polynomial order
An adaptive procedure • All random dimensions which satisfy will be split into two equal random elements in the next time step while all other random dimensions will remain unchanged. • If corresponds to element in the original random space , the element and will be generated in the next level if the two criteria are both satisfied. • Through this adaptive procedure, we can reduce the total element number while gaining the same accuracy and efficiency.
An Example: K-O problem So it can successfully resolve the discontinuity in stochastic space. However, due to its gPC nature in each element and the tree like adaptive refinement, this method still suffers from the curse of dimensionality.
First summary • The traditional gPC method fails when there are steep gradients/finite discontinuities in stochastic space. • gPC related method suffers from the so called curse of dimensionality when the input stochastic dimension is large. High CPU cost for large scale problems. • Although the h-adaptive ME-gPC can successfully resolve discontinuity in stochastic space, the number of element grows fast at the order . So the method is still a dimension-dependent method. • Therefore, an adaptive framework utilizing local basis functions that scales linearly with increasing dimensionality and has the decoupled nature of the MC method, offers greater promise in efficiently and accurately representing high-dimensional non-smooth stochastic functions.
Collocation based framework The stochastic variable at a point in the domain is an N dimensional function. The gPC method approximates this N dimensional function using a spectral element discretization (which causes the coupled set of equations) Spectral descretization represents the function as a linear combination of sets of orthogonal polynomials. Instead of a spectral discretization, use other kinds of polynomial approximations Use simple interpolation! That is, sample the function at some points and construct a polynomial approximation
Collocation based framework Need to represent this function Sample the function at a finite set of points Use polynomials (Lagrange polynomials) to get a approximate representation Function value at any point is simply Stochastic function in 2 dimensions Spatial domain is approximated using a FE, FD, or FV discretization. Stochastic domain is approximated using multidimensional interpolating functions
From one-dimension to higher dimensions • Denote the one dimensional interpolation formula as with the set of support nodes: • In higher dimension, a simple case is the tensor product formula For instance, if M=10 dimensions and we use k points in each direction This quickly becomes impossible to use. One idea is only to pick themost important points from the tensor product grid.
Conventional sparse grid collocation (CSGC) In higher dimensions, not all the points are equally important. Some points contribute less to the accuracy of the solution (e.g. points where the function is very smooth, regions where the function is linear, etc.). Discard the points that contribute less: SPARSE GRID COLLOCATION 1D sampling Tensor product 2D sampling
Conventional sparse grid collocation • Both piecewise linear as well as polynomial functions, L, can be used to construct the full tensor product formula above. • The Smolyak construction starts from this full tensor product representation and discards some multi-indices to construct the minimal representation of the function • Define the differential interpolant, ∆ as the difference between two interpolating approximations with • The sparse grid interpolation of the function f is given by where , is the multi-index representation of the support nodes. N is the dimension of the function f and q is an integer (q>N). k = q-N is called the depth of the interpolation. As q is increased more and more points are sampled.
Conventional sparse grid collocation • The interpolant can be represented in a recursive manner as with • Can increase the accuracy of the interpolant based on Smolyak’s construction WITHOUT having to discard previous results. • Can build on previous interpolants due to the recursive formulation • To compute the interpolant , one only needs the function values at the sparse grid point given by: One could select the sets , in a nested fashion such that • Similar to the interpolant representation in a recursive form, the sparse grid points can be represented as: with and
Choice of collocation points and nodal basis functions • One of the choice is the Clenshaw-Curtis grid at non-equidistant extrema of the Chebyshev polynomials. ( D.Xiu etc, 2005, F. Nobile etc, 2006) • By doing so, the resulting sets are nested. The corresponding univariate nodal basis functions are Lagrangian characteristic polynomials: • There are two problems with this grid: (1) The basis function is of global support, so it will fail when there is discontinuity in the stochastic space. (2) The nodes are pre-determined, so it is not suitable for implementation of adaptivity.
Choice of collocation points and nodal basis functions • Therefore, we propose to use the Newton-Cotes grid using equidistant support nodes and use the linear hat function as the univariate nodal basis. • It is noted that the resulting grid points are also nested and the grid has the same number of points as that of the Clenshaw-Curtis grid. • Comparing with the Lagrangian polynomial, the piecewise linear hat function has a local support, so it is expected to resolve discontinuities in the stochastic space. • The N-dimensional multi-linear basis functions can be constructed using tensor products:
Comparison of the two grids Clenshaw-Curtis grid Newton-Cotes grid
From nodal basis to multivariate hierarchical basis • We start from the 1D interpolating formula. By definition, we have with and we can obtain since we have For simplifying the notation, we consecutively number the elements in , and denote the j-th point of as hierarchical surplus
Nodal basis vs. hierarchical basis Nodal basis Hierarchical basis
Multi-dimensional hierarchical basis • For the multi-dimensional case, we define a new multi-index set which leads to the following definition of hierarchical basis • Now apply the 1D hierarchical interpolation formula to obtain the sparse grid interpolation formula for the multivariate case in a hierarchical form: Here we define the hierarchical surplus as:
Error estimate Depending on the order of the one-dimensional interpolant, one can come up with error bounds on the approximating quality of the interpolant1,2. If piecewise linear 1D interpolating functions are used to construct the sparse interpolant, the error estimate is given by: Where N is the number of sampling points, If 1D polynomial interpolation functions are used, the error bound is: assuming that f has k continuous derivatives • V. Barthelmann, E. Novak and K. Ritter, High dimensional polynomial interpolation on sparse grids, Adv. Comput. Math. 12 (2000), 273–288 • E. Novak, K. Ritter, R. Schmitt and A. Steinbauer, On an interpolatory method for high dimensional integration, J. Comp. Appl. Math. 112 (1999), 215–228
Hierarchical integration • So now, any function can now be approximated by the following reduced form • It is just a simple weighted sum of the value of the basis for all collocation points in the current sparse grid. Therefore, we can easily extract the useful statistics of the solution from it. • For example, we can sample independently N times from the uniform distribution [0,1] to obtain one random vector Y, then we can place this vector into the above expression to obtain one realization of the solution. In this way, it is easy to plot realizations of the solution as well as the PDF. • The mean of the random solution can be evaluated as follows: where the probability density function is 1 since we have assumed uniform random variables on a unit hypercube .
Hierarchical integration • The 1D version of the integral in the equation above can be evaluated analytically: • Since the random variables are assumed independent of each other, the value of the multi-dimensional integral is simply the product of the 1D integrals. • Denoting we rewrite the mean as • Thus, the mean is just an arithmetic sum of the hierarchical surpluses and the integral weights at each interpolation point.
Hierarchical integration • To obtain the variance of the solution, we need to first obtain an approximate expression for • Then the variance of the solution can be computed as • Therefore, it is easy to extract the mean and variance analytically, thus we only have the interpolation error. This is one of the advantages to use this interpolation based collocation method rather than quadrature based collocation method.
Adaptive sparse grid collocation (ASGC) • Let us first revisit the 1D hierarchical interpolation • For smooth functions, the hierarchical surpluses tend to zero as the interpolation level increases. • On the other hand, for non-smooth function, steep gradients/finite discontinuities are indicated by the magnitude of the hierarchical surplus. • The bigger the magnitude is, the stronger the underlying discontinuity is. • Therefore, the hierarchical surplus is a natural candidate for error control and implementation of adaptivity. If the hierarchical surplus is larger than a pre-defined value (threshold), we simply add the 2N neighbor points of the current point.
Adaptive sparse grid collocation • As we mentioned before, by using the equidistant nodes, it is easy to refine the grid locally around the non-smooth region. • It is easy to see this if we consider the 1D equidistant points of the sparse grid as a tree-like data structure • Then we can consider the interpolation level of a grid point Y as the depth of the tree D(Y). For example, the level of point 0.25 is 3. • Denote the father of a grid point as F(Y), where the father of the root 0.5 is itself, i.e. F(0.5) = 0.5. • Thus the conventional sparse grid in N- dimensional random space can be reconsidered as
Adaptive sparse grid collocation • Thus, we denote the sons of a grid point by • From this definition, it is noted that, in general, for each grid point there are two sons in each dimension, therefore, for a grid point in a N- dimensional stochastic space, there are 2N sons. • The sons are also the neighbor points of the father, which are just the support nodes of the hierarchical basis functions in the next interpolation level. • By adding the neighbor points, we actually add the support nodes from the next interpolation level, i.e. we perform interpolation from level |i| to level |i|+1. • Therefore, in this way, we refine the grid locally while not violating the developments of the Smolyak algorithm
Adaptive sparse grid collocation • To construct the sparse grid, we can either work through the multi-index or the hierarchical basis. However, in order to adaptively refine the grid, we need to work through the hierarchical basis. Here we first show the equivalence of these two ways in a two dimensional domain. • We always start from the multi-index (1, 1), which means the interpolation level in first dimension is 1, in second dimension is also 1. • For level 1, the associated point is only 0.5. • So the tensor product of the coordinate is • Thus, for interpolation level 0 of smolyak construction, we have
Adaptive sparse grid collocation • Recalling, we define the interpolation level k as . • So, now we come to the next interpolation level, For this requirement, the following two multi-indices are satisfied: Let us look at the index (1,2) The associated set of points is 0.5 (j=1). The associated set of points is 0 (j=1) and 1 (j=2). So the tensor product is Then the associated sum is
Adaptive sparse grid collocation • In the same way, for multi-index (2,1), we have the corresponding collocation points So the associated sum is Recall So for Smolyak interpolation level 1, we have
Adaptive sparse grid collocation • So, from this equation, it is noted that we can store the points with the surplus sequentially. When calculating interpolating value, we only need to perform a simple sum of each term. We don’t need to search for the point. Thus, it is very efficient to generate the realizations. • Here, we give the first three level of multi-index and corresponding nodes.
Adaptive sparse grid collocation • In fact, the most important formula is the following From this equation, in order to calculate the surplus, we need the interpolation value at just previous level. That is why it requires that we construct the sparse grid level by level. • Now, let us revisit the algorithm in a different view. we start from the hierarchical basis. We still use the two-dimension grid as an example. Recalling
Adaptive sparse grid collocation • As previous given, the first level in a sparse grid is always a point (0.5,0.5). There is a possibility that the function value at this point is 0 and thus the refinement terminates immediately. In order to avoid the early stop for the refinement process, we always refine the first level and keep a provision on the first few hierarchical surpluses. • It is noted here, the associated multi-index with this point (0.5, 0.5)is (1,1),i.e. and it corresponds to level of interpolation k = 0. So the associated sum is still • Now if the surplus is larger than the threshold, we add sons (neighbors) of this point to the grid.
Adaptive sparse grid collocation • In one-dimension, the sons of the point 0.5 are 0,1. According to the definition, the sons of the point (0.5,0.5) are which are just the points in the level 1 of sparse grid. Indeed, the associated multi-index is Recalling when we start from multi-index So actually, we can see they are equivalent. • Now, let us summarize what we have So the associated sum is
Adaptive sparse grid collocation • After calculating the hierarchical surplus, now we need to refine. Here we assume we refine all of the points, i.e. threshold is 0. • Notice here, all of the son points are actually from next interpolation level, so we still construct the sparse grid level by level. • First, in one dimension, the sons of point 0 is 0.25, 1 is 0.75. • So the sons of each point in this level are: • Notice here, we have some redundant points. Therefore, it is important to keep track of the uniqueness of the newly generated (active) points.
Adaptive sparse grid collocation • After delete the same points, and arrange ,we can have which are just all of the support nodes in k=2. • Therefore, we can see by setting the threshold equal zero, conventional sparse gridis exactlyrecovered. • We actually work on point by point instead of the multi-index. • In this way, we can control the local refinement of the point. • Here, we equally treat each dimension locally, i.e., we still add the two neighbor points in all N dimensions. So there are generally 2N son (neighbor) points. • If the discontinuity is along one dimension, this method can still detect it. So in this way, we say this method can also detect the important dimension.
Adaptive sparse grid collocation: implementation • As shown before, we need to keep track of the uniqueness of the newly active points. • To this end, we use the data structure <set> from the standard template library in C++ to store all the active points and we refer to this as the active index set. • Due to the sorted nature of the <set>, the searching and inserting is always very efficient. Another advantage of using this data structure is that it is easy for a parallel code implementation. Since we store all of the active points from the next level in the <set>, we can evaluate the surplus for each point in parallel, which increases the performance significantly. • In addition, when the discontinuity is very strong, the hierarchical surpluses may decrease very slowly and the algorithm may not stop until a sufficiently high interpolation level. Therefore, a maximum interpolation level is always specified as another stopping criterion.