290 likes | 408 Views
Global sensitivity analysis of computer models with functional inputs B. Iooss (CEA Cadarache) M. Ribatet (CEMAGREF Lyon) Conference SAMO 2007 Budapest, Hongrie. Functional data ?.
E N D
Global sensitivity analysis of computer models with functional inputsB. Iooss(CEA Cadarache)M. Ribatet(CEMAGREF Lyon)Conference SAMO 2007Budapest, Hongrie
Functional data ? • Classical model writes Y = f (X) , where Y is a scalar output variable and X is a vector of scalar input variables. X is considered as a vector of random variables Y is a random variable
Functional data ? • Classical model writes Y = f (X) , where Y is a scalar output variable and X is a vector of scalar input variables. X is considered as a vector of random variables Y is a random variable • The model with functional variables writes Y(v) = f (X1(u1),…, Xp(up)), where • v and ui are some parameters (scalar or multidimensionnal), • Y(v) is an output function, • Xi(ui) is an input function (possibly constant).
Functional data ? • Classical model writes Y = f (X) , where Y is a scalar output variable and X is a vector of scalar input variables. X is considered as a vector of random variables Y is a random variable • The model with functional variables writes Y(v) = f (X1(u1),…, Xp(up)), where • v and ui are some parameters (scalar or multidimensionnal), • Y(v) is an output function, • Xi(ui) is an input function (possibly constant). Ex. for u and v : time t, spatial coordinates (x,y,z), temperature T, … Xi(ui) are considered as random functions Y(v) is a random function.
August 2002 December 2010 Concentrations map An example of a functional input problem Pollutant (90Sr) transport simulation in porous media [ Volkova et al., SERRA 07 ] First study : • 20 random input variables (permeability, porosity, Kd, …), • 20 scalar outputs (concentrations at piezometers), • LH sample (N=300) 300 model evaluations (3 days) • Construction of metamodels, • Global sensitivity analysis (Sobol) via the use of metamodels. Result : permeability of the second layer is the most influent variable.
An example of a functional input problem • Second study : • We want to take into account the spatial heterogeneity of the permeability. • We represent it by a random field e(x,y). • Realisations of this random field are obtained via geostatistical simulation techniques. • Classical methods of global sensitivity analysis or metamodel construction are no more applicable. 2 possible realisations of the permeability
Some recent works(not exhaustive) Functional input : • Tarantola et al., SERRA 02 : environmental assessment problem. Some inputs represent the errors in spatially distributed maps (random fields), obtained by simulations.
Some recent works(not exhaustive) Functional input : • Tarantola et al., SERRA 02 : environmental assessment problem. Some inputs represent the errors in spatially distributed maps (random fields), obtained by simulations. • Ruffo et al., RESS 06 : hydrocarbon exploration risk evaluation. The basin and petroleum system models are very complex random fields. Consider one scenario variable (32 basin models) as a categorical variable.
Some recent works(not exhaustive) Functional input : • Tarantola et al., SERRA 02 : environmental assessment problem. Some inputs represent the errors in spatially distributed maps (random fields), obtained by simulations. • Ruffo et al., RESS 06 : hydrocarbon exploration risk evaluation. The basin and petroleum system models are very complex random fields. Consider one scenario variable (32 basin models) as a categorical variable. • Zabalza-Mezghani et al., JPSE 04 : hydrocarbon production optimization. The random field is considered as an uncontrollable input variable (« Stochastic uncertainty parameter » ). The other scalar inputs are the controllable variables.
Our problem and some possible solutions Compute the Sobol indices when some input variables e are functional.
Our problem and some possible solutions Compute the Sobol indices when some input variables e are functional. • Complete discretization : unrealizable (several thousands of parameters).
Our problem and some possible solutions Compute the Sobol indices when some input variables e are functional. • Complete discretization : unrealizable (several thousands of parameters). • Expansion in an appropriate basis function : impracticable in some cases (for ex. if the functional input is a temporal white noise).
Our problem and some possible solutions Compute the Sobol indices when some input variables e are functional. • Complete discretization : unrealizable (several thousands of parameters). • Expansion in an appropriate basis function : impracticable in some cases (for ex. if the functional input is a temporal white noise). • Consider the functional input as an unique multi-dimensional parameter. Multidimensional sensitivity indices (Sobol, MCS 01, Jacques et al., RESS 06) via algorithms which use some independent samples (simple Monte-Carlo). FAST, RBD and quasi-MC methods are not applicable.
Our problem and some possible solutions Compute the Sobol indices when some input variables e are functional. • Complete discretization : unrealizable (several thousands of parameters). • Expansion in an appropriate basis function : impracticable in some cases (for ex. if the functional input is a temporal white noise). • Consider the functional input as an unique multi-dimensional parameter. Multidimensional sensitivity indices (Sobol, MCS 01, Jacques et al., RESS 06) via algorithms which use some independent samples (simple Monte-Carlo). FAST, RBD and quasi-MC methods are not applicable. • Replace the functional input by a scalar parameter x ~ U[0,1] : it governs the simulation (or not) of the functional input e(Tarantola et al., SERRA 02). Calculate the Sobol index of x by any methods. It leads to a quantification of the sensitivity of the output due to the presence/absence of e, but not due to the variability of e.
Moreover, in our case, we need metamodels We deal with complex computer codes : non linear effects, time consuming, large number of inputs (>10). The Sobol indices estimation cannot be made via the direct use of the code, but via the intermediate use of a metamodel.
Moreover, in our case, we need metamodels We deal with complex computer codes : non linear effects, time consuming, large number of inputs (>10). The Sobol indices estimation cannot be made via the direct use of the code, but via the intermediate use of a metamodel. Zabalza-Mezgani et al., JPSE 04, propose to consider the functional input as an uncontrollable parameter. With scalar inputs X and functional input e(u), the metamodel becomes a mean component Ee(Y|X) and a variance component Vare(Y|X). Uncertainty propagation via this joint model. Y E(Y|X) + s(Y|X) E(Y|X) X
Sobol indices of the joint model Var[Y(X ,e) ] = Var[ Ee(Y |X ) ] + E[ Vare(Y |X) ] = Var[ Ym(X) ] + E[ Yd (X) ] Variance decomposition of Y : Variance decomposition of Ym : Then, Sobol indices of Xon Yare obtained by : E[Yd (X) ] contains all the terms including effects of e. Total Sobol indice of e :
Modeling the mean Ym and dispersion Yd Dual modeling by 2 polynomials (Taguchi 86, Vining & Myers, JQT 90). Joint modeling by 2 Generalized Linear Models (McCullagh & Nelder 89) • more general theoretical framework (exponential family distribution), • modelize simultaneously the mean and variance: iterative fits, • no replications needed (require less computations). • For the dispersion d, we take the deviance contribution. • Deviance analysis, Student and Fisher tests, residuals analyses, … allow to perform terms selection and to choose functions g and v. dispersion mean
Joint modeling with Generalized Additive Models The drawback of GLM is its parametric form which leads to limitations when modeling complex computer codes. Replace it by popular non parametric models : GAM (Hastie & Tibshirani) si’s are obtained by fitting a smoother to the data : penalized regression splines (integrated model selection via Generalized Cross Validation). Deviance analysis, statistical tests on coefficients, residuals analyses, … allow to perform terms selection. Compared to other metamodels (kriging, neural networks) : • GAM offers a direct interpretation of the model • the drawback stands in the additive effect hypothesis.
Simple example : Ishigami function with Xi ~ U[-, ] To test our joint models, X3 is considered as an uncontrollable input. Models are fitted on 1e3 data. Predictivity coef. Q2 is computed on 1e4 test data.
Simple example : Ishigami function with Xi ~ U[-, ] To test our joint models, X3 is considered as an uncontrollable input. Models are fitted on 1e3 data. Predictivity coef. Q2 is computed on 1e4 test data. Joint GLM (Q2 = 61 %) : Simple GAM (Q2 = 75 %) : Joint GAM Q2 (mean) =76 %, Explained deviance : 93% (mean), 37% (dispersion)
Simple example : Ishigami function with Xi ~ U[-, ] To test our joint models, X3 is considered as an uncontrollable input. Models are fitted on 1e3 data. Predictivity coef. Q2 is computed on 1e4 test data. Joint GLM (Q2 = 61 %) : Simple GAM (Q2 = 75 %) : Joint GAM Q2 (mean) =76 %, Explained deviance : 93% (mean), 37% (dispersion)
An hydrogeological application Pollutant (90Sr) transport simulation in porous media • 16 scalar input variables : sorption coef. (kd) and permeabilities (per) of different hydrogeologic layers, porosity, infiltration rate, … • 1 functional input : the permeability • LH sample (N=300) for the 16 inputs 300 model evaluations (8 days) • 1 output : the concentration at a specified location
An hydrogeological application Pollutant (90Sr) transport simulation in porous media • 16 scalar input variables : sorption coef. (kd) and permeabilities (per) of different hydrogeologic layers, porosity, infiltration rate, … • 1 functional input : the permeability • LH sample (N=300) for the 16 inputs 300 model evaluations (8 days) • 1 output : the concentration at a specified location • Joint GAM : Devexp(mean) = 98%, Devexp(dispersion) = 29% Explanatory terms : mean [ s(kd1) , s(kd2) , s(per3) , s(per2,kd2) ] dispersion [ kd1 , kd2 ]
An hydrogeological application Pollutant (90Sr) transport simulation in porous media • 16 scalar input variables : sorption coef. (kd) and permeabilities (per) of different hydrogeologic layers, porosity, infiltration rate, … • 1 functional input : the permeability • LH sample (N=300) for the 16 inputs 300 model evaluations (8 days) • 1 output : the concentration at a specified location • Joint GAM : Devexp(mean) = 98%, Devexp(dispersion) = 29% Explanatory terms : mean [ s(kd1) , s(kd2) , s(per3) , s(per2,kd2) ] dispersion [ kd1 , kd2 ] S(kd2)=52%,S(per2)=8%,S(kd2,per2)=6%,S(kd1)=4% ST(e)=28%,S(kd1,e) > 0 and S(kd2,e) > 0
Conclusions • This approach, based on joint models to compute Sobol sensitivity indices, is useful in the following situations : • model with « complex » functional inputs, • time consuming model (so a metamodel is needed), • heteroscedasticity (functional input interacts with scalar inputs),
Conclusions • This approach, based on joint models to compute Sobol sensitivity indices, is useful in the following situations : • model with « complex » functional inputs, • time consuming model (so a metamodel is needed), • heteroscedasticity (functional input interacts with scalar inputs), • Another great interest : uncertainty propagation.
Conclusions • This approach, based on joint models to compute Sobol sensitivity indices, is useful in the following situations : • model with « complex » functional inputs, • time consuming model (so a metamodel is needed), • heteroscedasticity (functional input interacts with scalar inputs), • Another great interest : uncertainty propagation. • Actual limitations : • It cannot distinguish the effects of different functional inputs. • we obtain qualitative sensitivity indices of the interactions between functional input and other inputs.
Useful SOFTWARE R Packages : “JointModeling” “sensitivity” of G. Pujol