310 likes | 338 Views
Data Mining of Environmental Models for Sensitivity Analysis. re. Knowledge Discovery. Tom Stockton Paul Black, Andy Schuh, Kate Catlett, John Tauxe Neptune and Company, Inc. www.neptuneandco.com. Issue.
E N D
Data Mining of Environmental Models for Sensitivity Analysis re Knowledge Discovery Tom Stockton Paul Black, Andy Schuh, Kate Catlett, John Tauxe Neptune and Company, Inc. www.neptuneandco.com
Issue How to conduct a sensitivity analysis of a complex high dimensional probabilistic environmental model?
Decision Modeling • Decision Model, build and solve • Decision Actions and Outcomes • Utility (costs, liabilities, desires) • Probabilistic model • Scenario • Model • Parameter • Sensitivity analysis (knowledge re-discovery) • Value of information analysis (OUT-path) • Data collection • Update model (Bayesian or ad hoc)
Decision Modeling U(d | I) = supdòQ´S´M´Y U(d | y, S, M,qM) utility function p(S) scenario uncertainty p(M|S)model uncertainty p(qM | S) parameter uncertainty p(I | qM ,M, S) data likelihood p(y | qM , M,S) risk predictive dist dydS dM dqM where: U = utility, loss, cost M = model structure d = decision qM = model parameters I = information/data S = scenario y = risk
Sensitivity Analysis Given a model: Y = f (X) [Y = GoldSim(X)] Sensitivity analysis is aimed at describing the influence of each input variable Xi on the model response Y
Sensitivity Measures • One-At-A-Time (OAT) • Differential Analysis • Global • Statistical • scatter plots, correlation, regression, rank transformations • Data mining • Sobol, FAST, MARS, MART
Desirable Propertiesof a SA Measure • Efficiency • account for all effects while being computationally affordable • Simplicity • implementable and interpretable • Model Independent • The method can handle non-linearity, non-monotonicity (across time and space) K. Chan, S. Tarantola and A. Saltelli, 2000, Variance-Based Methods, in Sensitivity Analysis, A. Saltelli, K. Chan, E.M.Scott.John Wiley and Sons.
Sensitivity Measures • OAT and Differential Analysis, for complex probabilistic models, often are • not efficient, and • notmodel independent
Global Sensitivity Measures • Sensitivity Measure • Build a statistical model of the model response and the model inputs using the Monte Carlo simulation results • Decompose variance of the output and attribute to input variables
Standardized Rank Regression SRR • Rank Y and Xi and scale the ranks to mean of 0 and variance of 1 for convenience Based on the ranks of Y and Xi Assuming the Xi are independent
Fourier Amplitude Sensitivity Test FAST • Explores the multidimensional input space of the input factors by a search curve using Fourier transform function. • Handles main and interaction effects K. Chan, S. Tarantola and A. Saltelli, 2000, Variance-Based Methods, in Sensitivity Analysis, A. Saltelli, K. Chan, E.M.Scott.John Wiley and Sons.
Issues • Differential Analysis • not feasible: derivatives of complex models • SRR and OAT • notmodel independent: trouble with nonmonotonic nonlinear models. • not efficient: trouble with interaction effects in high dimensional models • FAST • not efficient: Separate model runs
Possible Solutions • Data mine the probabilistic model output • Multivariate Adaptive Regression Splines (MARS) • Multiple Additive Regression Trees (MART)
Data Mining • MARS • Non-parametric recursive partitioning approach that fits separate splines to distinct intervals of the predictor variables. • MART • Explores the multidimensional input space of the input factors using gradient boosting of additive regression models. • Advantages • Search for interactions between variables, allowing any degree of interaction to be considered. • Tracks very complex data structures in high-dimensional data.
Sensitivity Indices viaANOVA decomposition Sensitivity indices are calculated using basis functions not including xs
Analytical Example Sobol’ g-function Saltelli A., Tarantola S., and Chan K.P.-S. (1999), “A Quantitative Model-Independent Method for Global Sensitivity Analysis of Model Output,” Technometrics, 41, 39-55.
Example: Sobol’ g-function Saltelli A., Tarantola S., and Chan K.P.-S. (1999), “A Quantitative Model-Independent Method for Global Sensitivity Analysis of Model Output,” Technometrics, 41, 39-55.
1 Management Options - Institutional Controls - Site Maintenance - Waste Acceptance - Closure - Monitoring/Surveillance 2 Future Inventory Existing Inventory Fate & Transport Research, Monitoring, Information & Data Collection Occupational MOP & IHI Cumulative (CA) Ecosystem 7 • Maintenance Review • Periodic Review • Waste Acceptance Decision • Closure Decision 3 Value of Information 6 4 Contamination Risk Budgets Management Disposal Costs Closure Costs Cost Potential Liabilities Monitoring Costs ALARA Costs Disposal Fees NO 8 Analysis Costs Public Benefit Can the risk be managed to regulatory thresholds at an acceptable cost with an acceptable level of uncertainty? 5 Regulations & Guidance YES Uncertainty analysis Choose Management Options & Update Management Plan Legend Sensitivity analysis Sequence number Iteration loop 1
Simulation Results • Model Inputs ( X ) • Inventory • Fate and transport • Upward advection • Biotic transport • Model response ( Y ) • “EPA-SUM”
Variation Explained MART/ Time SRR MARSGCD 10,000 0.91 0.99 LANL 50 0.87 0.94 100 0.86 0.96 500 0.75 0.91 1,000 0.71 0.95 10,000 0.71 0.93
Summary • MART and MARS appear to provide an • Efficient • Simple (?) • Model Independent approach to data mining probabilistic model results for sensitivity analysis
Finally… • The decision context: • Is the uncertainty in the model response too high? • Is there value in reducing input uncertainty? • SA and cost used to estimate the value of collecting additional information.
MARS • Non-parametric recursive partitioning approach that fits separate splines to distinct intervals of the predictor variables. • Both the selected variables and the knots are found via a brute force, exhaustive search procedure optimized simultaneously by evaluating a "loss of fit" criterion. • Searches for interactions between variables, allowing any degree of interaction to be considered. • Tracks very complex data structures in high-dimensional data. J.H. Friedman, (1991), “Multivariate Adaptive Regression Splines,” The Annals of Statistics, 19, 1-14 Software: Trevor Hastie and Robert Tibshirani, MDA Library for R (‘GNU S’). Ross Ihaka and Robert Gentleman, (1996) R: A Language for Data Analysis and Graphics, Journal of Computational and Graphical Statistics, 5, 3, 299-314. www.r-project.org.
MART • Multiple Additive Regression Trees • Explores the multidimensional input space of the input factors using gradient boosting of additive regression models. • Handles main and interaction effects. • Fast K. Chan, S. Tarantola and A. Saltelli, 2000, Variance-Based Methods, in Sensitivity Analysis, A. Saltelli, K. Chan, E.M.Scott.John Wiley and Sons.