1 / 84

Calibration, Sensitivity Analysis and Uncertainty Analysis for Computationally Expensive Models

Calibration, Sensitivity Analysis and Uncertainty Analysis for Computationally Expensive Models. Prof. Christine Shoemaker Pradeep Mugunthan, Dr. Rommel Regis, and Dr. Jennifer Benaman School of Civil and Environmental Engineering and School of Operations Research and Industrial Engineering

thanos
Download Presentation

Calibration, Sensitivity Analysis and Uncertainty Analysis for Computationally Expensive Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Calibration, Sensitivity Analysis and Uncertainty Analysis for Computationally Expensive Models Prof. Christine Shoemaker Pradeep Mugunthan, Dr. Rommel Regis, and Dr. Jennifer Benaman School of Civil and Environmental Engineering and School of Operations Research and Industrial Engineering Cornell University South Florida Water District Morning Meeting Sept. 24, 3003

  2. Models Help Extract Information from Point Data to Processes Continuous in Space and Time Forecasts (with statistical representation) Comparison of Alternative Management Options Understanding Processes Point Data from monitoring or experiments at limited number of points in space and time Model that describes temporal and spatial connections

  3. Models Help Extract Information from Data___________________ for Multiple Outputs Model Outputs Forecasts (with statistical representation) Comparison of Alternative Management Options Understanding Processes Point Data from monitoring or experiments at limited number of points in space and time Model that describes temporal and spatial connections

  4. Steps in Modeling • Calibration—selecting parameter values within acceptable limits to fit the data as well as possible • Validation—applying the model and calibrated parameters to independent data set • Sensitivity Analysis—assess the impact of changes in uncertain parameter values on model output • Uncertainty Analysis-assessing the range of model outcomes likely given uncertainty in parameters, model error, and exogenous factors like weather.

  5. Computationally Expensive Models • It is difficult to calibrate for many parameters with existing methods with a limited number of simulations. • Most existing uncertainty methods require thousands of simulations. • We can only do a limited number of model simulations if models that hours to run. • Our methods are designed to reduce the number of simulations required to do good calibration and sensitivity analysis.

  6. Methods and Applications • We will discuss a general methodology for calibration, sensitivity analysis and uncertainty analysis that can be applied to many types of computationally expensive models. • We will present numerical examples for two “real life” examples: a watershed and a groundwater remediation.

  7. 1.Effective Use of Models and Observations Through Calibration, Sensitivity Analysis and Uncertainty Analysis • A description of the technical approach and “real life applications. Including: • Sensitivity Analysis for large number of parameters with application to a large watershed. • Optimization methods for calibration with application to ground water remediation based on field data. • Uncertainty Analysis based on groundwater model

  8. Cannonsville Watershed • Cannonsville Reservoir Basin – agricultural basin • Supply of New York City drinking water • To avoid $8 billion water filtration plant, need model analysis to help manage phosphorous 1200 km2 Watershed subject to economic constraints if P violations of TMDL.

  9. Monitoring Stations There are over 20,000 data for this watershed

  10. Questions • Using all this data, can we develop a model that is a useful forecasting tool to assess the impact of weather and phosphorous management actions on future loading the reservoir? • What phosphorous management strategies should be undertaken if any?

  11. I. Methodology for Sensitivity Analysis of a Model with Many Parameters: Application to Cannonsville Basin • Joint work with Jennifer Benaman (Cornell Ph.D. in Civil and Environmental Engineering, 2003) • Funded by EPA Star Fellowship

  12. Sensitivity Analysis with Many Parameters • Sensitivity Analysis measures the change in model output associated with the change (perturbation) in model input (e.g. in parameter values). • Purposes include: • To help select which parameters should be adjusted in a calibration and which can be left at default values. • This makes multivariate sensitivity and uncertainty analysis more feasible for computationally expensive models

  13. Sensitivity Analysis with Many Parameters- Additional Purposes • To prioritize additional data collection, and • To estimate potential errors in model forecasts that could be due to parameter value errors. • Sensitivity Analysis and calibration are difficult with a large number of parameters.

  14. Questions • Can we develop a sensitivity analysis method that is: • robust (doesn’t depend strongly on our assumptions)? • computationally efficient for a large number of parameters (hundreds)? • allows us to consider many different model outputs simultaneously? • .

  15. 160 parameters 35 basinwide 10 vary by land use (10 x 5 land uses) 7 vary by soil (7 x 10 soil types) 2 additional for corn and hay 1 additional for pasture Ranges obtained from literature, databases, and SWAT User’s Manual Choose Parameters Establish Parameter Ranges Choose Output Variables of Concern Application to Cannonsville Watershed

  16. Monitoring Stations

  17. Output Variables of Concern Basinwide (average annual from 1994-1998) Surface water runoff Snowmelt Groundwater flow Evapotranspiration Sediment yield Location in-stream (monthly average over entire simulation) Flow @ Beerston Flow @ Trout Creek Flow @ Town Brook Flow @ Little Delaware River Sediment load @ Beerston Sediment load @ Town Brook Choose Parameters Establish Parameter Ranges Choose Output Variables of Concern

  18. Final Results These are in top 20 for ALL cases These are in top 20 most of the time

  19. Computational Issues • We have a robust method for determining importance and sensitivity of parameters. • An advantage is that the number of model simulations is independent of the number of output variables, sensitivity indices, or weighting factors considered in the combined sensitivity analysis. (Almost no extra computation is required to do many output variables, indices or weightings.) • The number of simulations is simply the number required to do a single (non robust) univariate sensitivity analysis multiplied by the number of perturbation methods (=2 in this example).

  20. Next Steps • Once the most important parameters have been identified we can extend the analysis to more detailed analyses including: • Multivariate sensitivity analysis (changes in more than one parameter at a time) • Uncertainty Analysis (e.g. GLUE) • Both of these analyses above are highly computationally demanding and can hence only be done with a small number of parameters. • The (univariate) sensitivity analysis done here can identify the small number of parameters on which these analyses should be focused.

  21. Questions • Can we develop a sensitivity analysis method that is: • robust (doesn’t depend strongly on our assumptions)? • computationally efficient for a large number of parameters (hundreds)? • allows us to consider many different model outputs simultaneously? • Yes, the results for Cannonsville indicate this is possible with this methodology. • Models with longer simulation times require more total simulation times or fewer parameters.

  22. II: Useof Response Surface Methods in Non-Convex Optimization, Calibration and Uncertainty Analysis • Joint work with • Pradeep Mugunthan (PhD Candidate in Civil and Environmental Engineering) • Rommel Regis (Postdoctoral Fellow with PhD in Operations Research) • Funded by three National Science Foundaton (NSF) Projects

  23. Computational Effort for Trial and Error (Manual) Calibration • Assume that you have P parameters and you want to consider N levels of each. • Then the total number of combinations of possible sets of parameter is NP. • So with 10 parameters, considering only 2 values each (very crude evaluation), there are 21024 possible combinations, too many to evaluate all of them for computationally expensive function. • With 8 parameters considering a more reasonable 10 values each gives 100 million possible combinations of parameters! • With so many possibilities it is hard to find with trial and error good solutions with few (e.g. 100) function evaluations.

  24. Automatic Calibration • We would like to find the set of parameter values (decision variables) such that • the calibration error (objective function) is minimized • subject to constraints on the allowable range of the parameter values. This is an Optimization Problem. It can be a global optimization problem.

  25. NSF Project 1: Function Approximation Algorithms for Environment Analysis with Application to Bioremediation of Chlorinated Ethenes • Title: “Improving Calibration, Sensitivity and Uncertainty Analysis of Data-Based Models of the Environment”, • The project is funded by the NSF Environmental Engineering Program. • The following slides will discuss the application of these concepts to uncertainty analysis.

  26. “Real World Problem”:Engineered Dechlorination by Injection of Hydrogen Donor and Extraction We have developed a user friendly transport model of engineered anaerobic degradation of chlorinated ethenes that models chemical and biological species and utilizes MT3D and RT3D. This model is the application for the function approximation research.

  27. Optimization • Because our model is computationally expensive, we need to find a better way than trial and error to get a good calibration set of parameters. • Optimization can be used to efficiently search for a “best” solution. • We have developed optimization methods that are designed for computationally expensive functions.

  28. Optimization • Our goal is to find the minimum of f(x) where x є D • We want to do very few evaluations of f(x) because it is “costly to evaluate. This can be a measure of error between model prediction and observations X can be parameter values

  29. Global versus Local Minima Many optimization methods only find one local minimum. We want a method that finds the global minimum. F(x) Local minimum Global minimum X (parameter value)

  30. Experimental Design with Symmetric Latin Hypercube (SLHD) • To fit the first function approximation we need to have evaluated the function at several points. • We use a symmetric Latin Hypercube (SLHD) to pick these initial points. • The number of points we evaluate in the SLHD is (d+1)(d+2)/2, where d is the number of parameters (decision variables).

  31. One Dimensional Example of Experimental Design to Obtain Initial Function Approximation Objective Function f(x) measure of error Costly Function Evaluation (e.g. over .5 hour CPU time for one evaluation). x (parameter value-one dimensional example)

  32. Function Approximation with Initial Points from Experimental Design f(x) x (parameters) In real applications x is multidimensional since there are many parameters (e.g. 10).

  33. Update in Function Approximation with New Evaluation Update done in each iteration for function approximation for each algorithm expert. f(x) new x (parameter value) Function Approximation is a guess of the function value of f(x) for allx.

  34. Use of Derivatives • We use the gradient-based methods only on the function approximations R(x) (for which accurate derivatives are inexpensive to compute). • We do not try to compute gradients/derivatives for the underlying costly function f(x).

  35. Our RBF Algorithm • Our paper on RBF optimization algorithm has will appear soon in Jn. of Global Optimization . • The following graphs show a related RBF method called “Our RBF” as well as an earlier RBF optimization suggested by Gutmann (2000) in Jn. of Global Optimization called “Gutmann RBF”.

  36. Comparison of RBF Methods on a 14-dimensional Schoen Test Function (Average of 10 trials) Objective Function Our RBF Number of Function Evaluations

  37. Comparison of RBF Methods on a 12-dimensional Groundwater Aerobic Bioremediation Problem ( a PDE system) (Average of 10 trials) Objective Function Our RBF Number of Function Evaluations

  38. The following results are from:NSF Project 1: Function Approximation Algorithms for Environment Analysis with Application to Bioremediation of Chlorinated Ethenes • Title: “Improving Calibration, Sensitivity and Uncertainty Analysis of Data-Based Models of the Environment”, • The project is funded by the NSF Environmental Engineering Program.

  39. Now a real costly function:DECHLOR: Transport Model of Anaerobic Bioremediation of Chlorinated Ethene • This model was originally developed by Willis and Shoemaker based on kinetics equations by Fennell and Gossett. • This model will be our “costly” function in the optimization. • Model based on data from a field site in California.

  40. Dechlorinator Chlorinated.Ethenes PCE DCE TCE VC Ethene Donors H2 Butyrate Methane Acetate Propionate Hyd2Meth Prop2Ace But2Ace Lactate But2Ace Lac2Prop Lac2Ace Complex model: 18 species at each of thousands of nodes of finite difference model

  41. Example of Objective Function for Optimization of Chlorinated Ethene Model Observation Model where, SSE is the sum of squared errors between observed and simulated chlorinated ethenes is the observed molar concentration of species j at time t, location i is the simulated molar concentration of species j at time t, location i t = 1 to T represent time points at which measured data is available j = 1 to J represents PCE, TCE, DCE, VC and ethene in that order i = 1 to I is a set monitoring locations

  42. Algorithms Used for Comparison of Optimization Performance on Calibration • Stochastic Greedy Algorithm • Neighborhood defined to make search global • Neighbors generated from triangular distribution around current solution. Moves only to a better solution. • Evolutionary Algorithms • Derandomized evolution strategy DES with lambda = 10 and b1 = 1/n and b2 = 1/n0.5 (Ostermeier et al. 1992) • Binary or Real Genetic algorithm GA, population size 10, one point cross-over, mutation probability 0.1, crossover probability 1 • RBF Function Approximation Algorithms • RBF Gutmann- radial basis function approach, with cycle length five, SLH space filling design RBF-Cornell radial basis function approach. • FMINCON • derivative based optimizer in Matlab with numerical derivatives • 10 trials of 100 function evaluations were performed for heuristic and function approximation algorithms for comparison

  43. Lower curve is better ours Average is based on 10 trials. The best possible value for –NS is –1. 28 Experimental design evaluations done.

  44. Boxplot comparing best objective value (CNS) produced by the algorithms in each trial over 10 trials outlier average ours

  45. Conclusions • Optimizing costly functions is typically done only once. • The purpose for our examination of multiple trials is to examine how well one is likely to do if you do solve the problem only once. • Hence we want the method that has both the smallest Mean objective function value and the smallest Variance. • Our RBF has both the smallest Mean and the smallest Variance. • The second best method is Gutmann RBF, so RBF methods seem very good in general.

  46. Conclusions • Optimizing costly functions is typically done only once. • The purpose for our examination of multiple trials is to examine how well one is likely to do if you do solve the problem only once. • Hence we want the method that has both the smallest Mean objective function value and the smallest Variance. • Our RBF has both the smallest Mean and the smallest Variance. • The second best method is Gutmann RBF, so RBF methods seem very good in general.

  47. Alameda Field Data • The next step was to work with a real field site. • We obtained data from a DOD field site studied by a group (including Alleman, Morse, Gossett, and Fennell). • Running the simulation model takes about three hours for one run of the chlorinated ethene model at this site because of the nonlinearities in the kinetics equations.

  48. Site Layout

  49. Range of objective values for SSE objective function at Alameda field site - Mean, min and max are shown for each algorithm gradient ours

More Related