Function Approximation* Optimization for Computationally Expensive, Non Convex Functions Including Environmental Applica

Function Approximation* Optimization for Computationally Expensive, Non Convex Functions Including Environmental Applications Christine Shoemaker Cornell University CAS12@cornell.edu McMaster Optimization Conference, July 2004 Joint work with my Ph.D. students Rommel Regis and Pradeep Mugunthan

Applications of Optimization • Optimization in environmental and engineering problems can be used for designing the “best” (e.g. least expensive) solution to a problem. • Optimization can also be used to solve the “inverse” problem, i.e. given observations, find the best values of calibration parameters to fit the observations. • The numerical examples here are for calibration, but the method can be used for optimal design as well.

Optimization • Our goal is to find the minimum of f(x) where x є D • We want to do very few evaluations of f(x) because it is “costly to evaluate. This can be a measure of error between model prediction and observations X can be parameter values

Function ApproximationMethods • A nonlinear function approximation R(x) is a continuous nonlinear multivariate approximation to f(x). • R(x) has also been called a “response surface” or a “surrogate model”. • We use radial basis functions for our response surface, but other methods for non-convex surfaces could also be used.

Why Use Function Approximation Methods? • A function approximation R(x) can be used as part of an efficient parallel optimization algorithm in order to reduce the number of points at which we evaluate f(x),and thereby significantly reduce computational cost. • Our function approximation algorithm searches for the global minimum.

Goal: Derivative-Free Optimization of Costly, Black Box, Nonconvex Functions • For some complex functions derivatives are unavailable (because they are infeasible to compute or they cannot be accurately computed sufficiently quickly). • Inaccurate derivatives lead to inefficient gradient-based optimization and some important functions are not differentiable. • Our method does not require f’(x)

Goal: Derivative-Free Optimization of Costly, Black Box, Nonconvex Functions • Costly functions f(x) require a substantial amount of computation (minutes, hours, days) to evaluate the function once. • Examples of costly functions include simulation models or codes that solve systems of nonlinear partial differential equation. • Our method seeks to minimize number of costly function evaluations.

Goal: Derivative-free Optimization of Costly, Black Box, Nonconvex Functions • Gradient-based optimization methods stop at local minima instead of searching further for the global minimum. • For black box functions, we don’t know if the function is nonconvex or convex. • Our method is a good global optimization methods for black box and other nonconvex functions. • The method can be easily connected to different simulation models (as are heuristic methods like GAs).

Global versus Local Minima Many optimization methods only find one local minimum. We want a method that finds the global minimum. F(x) Local minimum Global minimum X (parameter value)

Experimental Design with Symmetric Latin Hypercube (SLHD) • To fit the first function approximation we need to have evaluated the function at several points. • We use a symmetric Latin Hypercube (SLHD) to pick these initial points. • The number of points we evaluate in the SLHD is (d+1)(d+2)/2, where d is the number of parameters (decision variables).

One Dimensional Example of Experimental Design to Obtain Initial Function Approximation Objective Function f(x) measure of error Costly Function Evaluation (e.g. over .5 hour CPU time for one evaluation). x (parameter value-one dimensional example)

Function Approximation with Initial Points from Experimental Design f(x) x (parameters) In real applications x is multidimensional since there are many parameters (e.g. 10).

Update in Function Approximation with New Evaluation Update done in each iteration for function approximation for each algorithm expert. f(x) new x (parameter value) Function Approximation is a guess of the function value of f(x) for allx.

Outline of Function Approximation Optimization • 1. Use a “space filling” experimental design (e.g. SLHD) to select a limited number of evaluation points. • 2. Make an approximation of the function (with Radial Basis Function splines) based on experimental design points. • 3. Use our algorithm to select next function point for evaluation based on FA and location of previously evaluated points • 4. Construct a new function approximation that incorporates the newly evaluated point and all prior points. • 5. Stop if reach maximum number of iterations. Otherwise go to Step 3

Outline of Function Approximation Optimization • 1. Use a “space filling” experimental design (e.g. SLHD) to select a limited number of evaluation points. • 2. Make an approximation of the function (with Radial Basis Function splines) based on experimental design points. • 3. Use our algorithm to select next function point for evaluation based on FA and location of previously evaluated points. (We have developed different algorithms. This is a research topic) • 4. Construct a new function approximation that incorporates the newly evaluated point and all prior points. • 5. Stop if reach maximum number of iterations. Otherwise go to Step 3

Use of Derivatives • We use the gradient-based methods only on the function approximations R(x) (for which accurate derivatives are inexpensive to compute). • We do not try to compute gradients/derivatives for the underlying costly function f(x).

Our Radial Basis Function Derivative-Free Optimization Algorithm • Following graph illustrates the search procedure and the function approximation. • We use a Radial Basis Functions RBF for the function approximation. • The objective function in the following contour plots is a two dimensional global optimization test function.

Exploratory Radial Basis Function Method Applied on Goldstein-Price X1 and X2 are two parameters, and contours indicate error function value TRUE Value Function Approximation x2 There are 3 other local min Evaluation point Global min x1 Points on right graph show the experimental design (SLHD) Best Value = 109.6351 (Global Min Value = 3) After 6 function evaluations (experimental design)

Exploratory Radial Basis Function Method Applied on Goldstein-Price TRUE RESPONSE SURFACE Best Value = 7.8632 (Global Min Value = 3) After 20 function evaluations

Exploratory Radial Basis Function Method Applied on Goldstein-Price RESPONSE SURFACE TRUE Best Value = 3.4139 (Global Min Value = 3) After 40 function evaluations

Our RBF Algorithm • Our paper on RBF optimization algorithm has will appear soon in Jn. of Global Optimization . • The following graphs show a related RBF method called “Our RBF” as well as an earlier RBF optimization suggested by Gutmann (2000) in Jn. of Global Optimization called “Gutmann RBF”.

How does our algorithm work? One approach is pick the next point to satisfy an optimization problem based on both the function approximation value (which we want to be small in a local search) and the distance to the nearest previously evaluated points (which we want to be far for a global search). We cycle the allowable minimum distance so that some search is local and some is global. A proof shows this approach will eventually generate a point that is arbitrarily close to the global optimum.

Previously evaluated points Variable 1 Minimum distance Potential new point for evaluation Variable 2 We would like to make the minimum distance large to explore the space.

Comparison of RBF Methods on a 14-dimensional Schoen Test Function (Average of 10 trials) Objective Function Our RBF Number of Function Evaluations

Comparison of RBF Methods on a 12-dimensional Groundwater Aerobic Bioremediation Problem ( a PDE system) (Average of 10 trials) Objective Function Our RBF Number of Function Evaluations

The following results are from:NSF Project 1: Function Approximation Algorithms for Environment Analysis with Application to Bioremediation of Chlorinated Ethenes • Title: “Improving Calibration, Sensitivity and Uncertainty Analysis of Data-Based Models of the Environment”, • The project is funded by the NSF Environmental Engineering Program.

Now a real costly function:DECHLOR: Transport Model of Anaerobic Bioremediation of Chlorinated Ethene • This model was originally developed by Willis and Shoemaker based on kinetics equations by Fennell and Gossett. • This model will be our “costly” function in the optimization. • Clean up of ground water is estimated to cost up to a trillion dollars (!), so optimization is well worth while.

Engineered Dechlorination by Injection of Hydrogen Donor and Extraction Injected Donor promotes degradation by providing hydrogen. Forecasting effectiveness requires transport model that predicts movement of donor and species through space over time and effect of pumping and injection of donor.

Dechlorination: PCE to ETH Chlorinated Ethenes in groundwater (drinking water) Cl Cl Dechlorination 1 C C H Cl Cl Cl C C Tetrachloroethene (PCE) Dechlorination 2 Cl Cl Trichloroethene (TCE) H H C C Dechlorination 3 Cl Cl cis-1,2-Dichloroethene (cDCE) H H C C Dechlorination 4 Cl H H H Vinyl Chloride (VC) C C H H All compounds except Ethene are very toxic Ethene (ETH)

Dechlorinator Chlorinated.Ethenes PCE DCE TCE VC Ethene Donors H2 Butyrate Methane Acetate Propionate Hyd2Meth Prop2Ace But2Ace Lactate But2Ace Lac2Prop Lac2Ace Complex model: 18 species at each of thousands of nodes of finite difference model

Optimization of Calibration of Chlorinated Ethenes Model • The model involves solution of a system of partial differential equations by finite difference methods. • There are 18 species variables at each node in the finite difference grid • The equations are highly nonlinear because of biological reactions. • Optimization applied to two cases: • hypothetical-8 minutes/simulation • field data application- 3 hours/simulation

Example of Objective Function for Optimization of Chlorinated Ethene Model Observation Model where, SSE is the sum of squared errors between observed and simulated chlorinated ethenes is the observed molar concentration of species j at time t, location i is the simulated molar concentration of species j at time t, location i t = 1 to T represent time points at which measured data is available j = 1 to J represents PCE, TCE, DCE, VC and ethene in that order i = 1 to I is a set monitoring locations

Algorithms Used for Comparison of Optimization Performance on Calibration • Stochastic Greedy Algorithm • Neighborhood defined to make search global • Neighbors generated from triangular distribution around current solution. Moves only to a better solution. • Evolutionary Algorithms • Derandomized evolution strategy DES with lambda = 10 and b1 = 1/n and b2 = 1/n0.5 (Ostermeier et al. 1992) • Binary or Real Genetic algorithm GA, population size 10, one point cross-over, mutation probability 0.1, crossover probability 1 • RBF Function Approximation Algorithms • RBF Gutmann- radial basis function approach, with cycle length five, SLH space filling design RBF-Cornell radial basis function approach. • FMINCON • derivative based optimizer in Matlab with numerical derivatives • 10 trials of 100 function evaluations were performed for heuristic and function approximation algorithms for comparison

Lower curve is better ours Average is based on 10 trials. The best possible value for –NS is –1. 28 Experimental design evaluations done.

Boxplot comparing best objective value (CNS) produced by the algorithms in each trial over 10 trials outlier average ours

Conclusions • Optimizing costly functions is typically done only once. • The purpose for our examination of multiple trials is to examine how well one is likely to do if you do solve the problem only once. • Hence we want the method that has both the smallest Mean objective function value and the smallest Variance. • Our RBF has both the smallest Mean and the smallest Variance. • The second best method is Gutmann RBF, so RBF methods seem very good in general.

Empirical Cummulative Distribution Plot – CNS Objective ours Curves to the left are best and stochastically dominate (higher probability of lower objective function value)’

Conclusions • Optimizing costly functions is typically done only once. • The purpose for our examination of multiple trials is to examine how well one is likely to do if you do solve the problem only once. • Hence we want the method that has both the smallest Mean objective function value and the smallest Variance. • Our RBF has both the smallest Mean and the smallest Variance. • The second best method is Gutmann RBF, so RBF methods seem very good in general.

Alameda Field Data • The next step was to work with a real field site. • We obtained data from a DOD field site studied by a group (including Alleman, Morse, Gossett, and Fennell). • Running the simulation model takes about three hours for one run of the chlorinated ethene model at this site because of the nonlinearities in the kinetics equations.

Site Layout

Numerical Set-up for Simulation • Finite difference grid with over 5300 nodes • 18 species and about 6000(?) time steps equals almost 8 billion variables • Each simulation takes between 2-3 hours • Optimization attempted for calibration of 8 parameters (6 biokinetic and 2 biological) h = 21’ (6.4 m) flow No flow No flow 65 ft (22m) Numerical set-up for simulation

Range of objective values for SSE objective function at Alameda field site - Mean, min and max are shown for each algorithm gradient ours

Conclusions on RBF Optimization of Calibration • Radial Basis Function Approximation Methods can be used effectively to find optimal solutions of costly functions. • “Our RBF” performed substantially better than the previous RBF method by Gutmann on the difficult chlorinated ethene remediation problem, especially because our RBF is robust (small variance). • Both Genetic algorithms and derivative-based search did very poorly. • The two RBF methods did much better on the Alameda field data problem than other methods.

However,300 hours is a long time to wait!Solution: Parallel Algorithms • We would like to be able to speed up calculations for costly functions by using parallel computers. • To get a good speed up on a parallel computer, you need an algorithm that parallelizes efficiently. • We are developing such an algorithm.

New Project 2: Parallel Optimization Algorithms • Funded by the Computer Science (CISE) Directorate at NSF • The method is general and can be used for a wide range of problems including other engineering systems in addition to environmental systems. • This research is underway.

Development of Parallel Algorithms • This proposal focuses on developing function approximation algorithms that will be very efficient when done in parallel. • One can re-code any optimization algorithm to run in parallel, but it might not be very efficient (e.g. runs only twice as fast with 10 processors as with one processor).

MAPO: Multi-Algorithm Parallel Optimization • MAPO is an optimization algorithm we are developing that is designed to find with parallel computation good solutions with relatively few function evaluations. • The basic idea is that in each iteration there is a “committee” of algorithm experts, each of whom recommend several candidate points. • .

MAPO: Multi-Algorithm Parallel Optimization • MAPO is an optimization algorithm we are developing that is designed to find with parallel computation good solutions with relatively few function evaluations. • The basic idea is that in each iteration there is a “committee” of algorithm experts, each of whom recommend several candidate points. • All candidate points are compared and P are selected for evaluation (P=no. of processors) • .

MAPO: Multi-Algorithm Parallel Optimization • MAPO is an optimization algorithm we are developing that is designed to find with parallel computation good solutions with relatively few function evaluations. • The basic idea is that in each iteration there is a “committee” of algorithm experts, each of whom recommend several candidate points. • All candidate points are compared and P are selected for evaluation (P=no. of processors) • Results of evaluations for f(x) values from all P processors are available for all algorithms to build their response surfaces in the next generation.

Evaluation Points in MAPO Candidate point not evaluated Previously evaluated points Variable 1 Minimum distance new point for evaluation Variable 2 For P parallel processors we want to evaluate P points at once. P=4 in picture

Function Approximation* Optimization for Computationally Expensive, Non Convex Functions Including Environmental Applica