1 / 31

Data Mining of Environmental Models for Sensitivity Analysis

Data Mining of Environmental Models for Sensitivity Analysis. re. Knowledge Discovery. Tom Stockton Paul Black, Andy Schuh, Kate Catlett, John Tauxe Neptune and Company, Inc. www.neptuneandco.com. Issue.

Download Presentation

Data Mining of Environmental Models for Sensitivity Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining of Environmental Models for Sensitivity Analysis re Knowledge Discovery Tom Stockton Paul Black, Andy Schuh, Kate Catlett, John Tauxe Neptune and Company, Inc. www.neptuneandco.com

  2. Issue How to conduct a sensitivity analysis of a complex high dimensional probabilistic environmental model?

  3. Decision Modeling • Decision Model, build and solve • Decision Actions and Outcomes • Utility (costs, liabilities, desires) • Probabilistic model • Scenario • Model • Parameter • Sensitivity analysis (knowledge re-discovery) • Value of information analysis (OUT-path) • Data collection • Update model (Bayesian or ad hoc)

  4. Decision Modeling U(d | I) = supdòQ´S´M´Y U(d | y, S, M,qM) utility function p(S) scenario uncertainty p(M|S)model uncertainty p(qM | S) parameter uncertainty p(I | qM ,M, S) data likelihood p(y | qM , M,S) risk predictive dist dydS dM dqM where: U = utility, loss, cost M = model structure d = decision qM = model parameters I = information/data S = scenario y = risk

  5. Sensitivity Analysis Given a model: Y = f (X) [Y = GoldSim(X)] Sensitivity analysis is aimed at describing the influence of each input variable Xi on the model response Y

  6. Sensitivity Measures • One-At-A-Time (OAT) • Differential Analysis • Global • Statistical • scatter plots, correlation, regression, rank transformations • Data mining • Sobol, FAST, MARS, MART

  7. Desirable Propertiesof a SA Measure • Efficiency • account for all effects while being computationally affordable • Simplicity • implementable and interpretable • Model Independent • The method can handle non-linearity, non-monotonicity (across time and space) K. Chan, S. Tarantola and A. Saltelli, 2000, Variance-Based Methods, in Sensitivity Analysis, A. Saltelli, K. Chan, E.M.Scott.John Wiley and Sons.

  8. Sensitivity Measures • OAT and Differential Analysis, for complex probabilistic models, often are • not efficient, and • notmodel independent

  9. Global Sensitivity Measures • Sensitivity Measure • Build a statistical model of the model response and the model inputs using the Monte Carlo simulation results • Decompose variance of the output and attribute to input variables

  10. Standardized Rank Regression SRR • Rank Y and Xi and scale the ranks to mean of 0 and variance of 1 for convenience Based on the ranks of Y and Xi Assuming the Xi are independent

  11. Fourier Amplitude Sensitivity Test FAST • Explores the multidimensional input space of the input factors by a search curve using Fourier transform function. • Handles main and interaction effects K. Chan, S. Tarantola and A. Saltelli, 2000, Variance-Based Methods, in Sensitivity Analysis, A. Saltelli, K. Chan, E.M.Scott.John Wiley and Sons.

  12. Issues • Differential Analysis • not feasible: derivatives of complex models • SRR and OAT • notmodel independent: trouble with nonmonotonic nonlinear models. • not efficient: trouble with interaction effects in high dimensional models • FAST • not efficient: Separate model runs

  13. Possible Solutions • Data mine the probabilistic model output • Multivariate Adaptive Regression Splines (MARS) • Multiple Additive Regression Trees (MART)

  14. Data Mining • MARS • Non-parametric recursive partitioning approach that fits separate splines to distinct intervals of the predictor variables. • MART • Explores the multidimensional input space of the input factors using gradient boosting of additive regression models. • Advantages • Search for interactions between variables, allowing any degree of interaction to be considered. • Tracks very complex data structures in high-dimensional data.

  15. Sensitivity Indices viaANOVA decomposition Sensitivity indices are calculated using basis functions not including xs

  16. Analytical Example Sobol’ g-function Saltelli A., Tarantola S., and Chan K.P.-S. (1999), “A Quantitative Model-Independent Method for Global Sensitivity Analysis of Model Output,” Technometrics, 41, 39-55.

  17. Example: Sobol’ g-function Saltelli A., Tarantola S., and Chan K.P.-S. (1999), “A Quantitative Model-Independent Method for Global Sensitivity Analysis of Model Output,” Technometrics, 41, 39-55.

  18. 1 Management Options - Institutional Controls - Site Maintenance - Waste Acceptance - Closure - Monitoring/Surveillance 2 Future Inventory Existing Inventory Fate & Transport Research, Monitoring, Information & Data Collection Occupational MOP & IHI Cumulative (CA) Ecosystem 7 • Maintenance Review • Periodic Review • Waste Acceptance Decision • Closure Decision 3 Value of Information 6 4 Contamination Risk Budgets Management Disposal Costs Closure Costs Cost Potential Liabilities Monitoring Costs ALARA Costs Disposal Fees NO 8 Analysis Costs Public Benefit Can the risk be managed to regulatory thresholds at an acceptable cost with an acceptable level of uncertainty? 5 Regulations & Guidance YES Uncertainty analysis Choose Management Options & Update Management Plan Legend Sensitivity analysis Sequence number Iteration loop 1

  19. Simulation Results • Model Inputs ( X ) • Inventory • Fate and transport • Upward advection • Biotic transport • Model response ( Y ) • “EPA-SUM”

  20. Model Response

  21. Relative Influence Plot

  22. Partial Dependence Plots

  23. Co-partial Dependence Plot

  24. Variation Explained MART/ Time SRR MARSGCD 10,000 0.91 0.99 LANL 50 0.87 0.94 100 0.86 0.96 500 0.75 0.91 1,000 0.71 0.95 10,000 0.71 0.93

  25. Sensitivity Convergence

  26. Upward Flux OAT

  27. Summary • MART and MARS appear to provide an • Efficient • Simple (?) • Model Independent approach to data mining probabilistic model results for sensitivity analysis

  28. Finally… • The decision context: • Is the uncertainty in the model response too high? • Is there value in reducing input uncertainty? • SA and cost used to estimate the value of collecting additional information.

  29. FAST

  30. MARS • Non-parametric recursive partitioning approach that fits separate splines to distinct intervals of the predictor variables. • Both the selected variables and the knots are found via a brute force, exhaustive search procedure optimized simultaneously by evaluating a "loss of fit" criterion. • Searches for interactions between variables, allowing any degree of interaction to be considered. • Tracks very complex data structures in high-dimensional data. J.H. Friedman, (1991), “Multivariate Adaptive Regression Splines,” The Annals of Statistics, 19, 1-14 Software: Trevor Hastie and Robert Tibshirani, MDA Library for R (‘GNU S’). Ross Ihaka and Robert Gentleman, (1996) R: A Language for Data Analysis and Graphics, Journal of Computational and Graphical Statistics, 5, 3, 299-314. www.r-project.org.

  31. MART • Multiple Additive Regression Trees • Explores the multidimensional input space of the input factors using gradient boosting of additive regression models. • Handles main and interaction effects. • Fast K. Chan, S. Tarantola and A. Saltelli, 2000, Variance-Based Methods, in Sensitivity Analysis, A. Saltelli, K. Chan, E.M.Scott.John Wiley and Sons.

More Related