220 likes | 335 Views
Slides for Introduction to Stochastic Search and Optimization ( ISSO ) by J. C. Spall. CHAPTER 17 O PTIMAL D ESIGN FOR E XPERIMENTAL I NPUTS. Organization of chapter in ISSO* Background Motivation Finite sample and asymptotic (continuous) designs Precision matrix and D -optimality
E N D
Slides for Introduction to Stochastic Search and Optimization (ISSO)by J. C. Spall CHAPTER 17OPTIMALDESIGNFOREXPERIMENTALINPUTS Organization of chapter in ISSO* Background Motivation Finite sample and asymptotic (continuous) designs Precision matrix and D-optimality Linear models Connections to D-optimality Key equivalence theorem Response surface methods Nonlinear models *Note: Appendix to these slides is brief discussion of factorial design (not in ISSO)
Optimal Design in Simulation • Two roles for experimental design in simulation • Building approximation to existing large-scale simulation via “metamodel” • Building simulation model itself • Metamodels are “curve fits” that approximate simulation input/output • Usual form is low-order polynomial in the inputs; linear in parameters • Lineardesign theory useful • Building simulation model • Typically need nonlinear design theory • Some terminology distinctions: • “Factors” (statistics term) “Inputs” (modeling and simulation terms) • “Levels” “Values” • “Treatments” “Runs”
Unique Advantages of Design in Simulation • Simulation experiments may be considered special case of general experiments • Some unique benefits occur due to simulation structure • Can control factors not generally controllable (e.g., arrival rates into network) • Direct repeatability due to deterministic nature of random number generators • Variance reduction (CRNs, etc.) may be helpful • Not necessary to randomize runs to avoid systematic variation due to inherent conditions • E.g., randomization in run order and input levels in biological experiment to reduce effects of change in ambient humidity in laboratory • In simulation, systematic effects can be eliminated since analyst controls nature
Design of Computer Experiments in Statistics • There exists significant activity among statisticians for experimental design based on computer experiments • T. J. Santner et al. (2003), The Design and Analysis of Computer Experiments, Springer-Verlag • J. Sacks et al (1989), “Design and Analysis of Computer Experiments (with discussion),” Statistical Science, 409–435 • Etc. • Above statistical work differs from experimental design with Monte Carlo simulations • Above work assumes deterministic function evaluations via computer (e.g., solution to complicated ODE) • One implication of deterministic function evaluations: no need to replicate experiments for given set of inputs • Contrasts with Monte Carlo, where replication provides variance reduction
General Optimal Design Formulation (Simulation or Non-Simulation) • Assume model z = h(,x) + v , where x is an input we are trying to pick optimally • Experimental design consists of N specific input values x = iand proportions (weights) to these input values wi: • Finite-sample design allocates n N available measurements exactly; asymptotic (continuous) design allocates based on n
D-Optimal Criterion • Picking optimal design requires criterion for optimization • Most popular criterion is D-optimal measure • Let M(,) denote the “precision matrix” for an estimate of based on a design • M(,) is inverse of covariance matrix for estimate and/or • M(,) is Fisher information matrix for estimate • D-optimal solution is
Equivalence Theorem • Consider linear model • Prediction based on parameter estimate and “future” measurement vector hT is • Kiefer-Wolfowitz equivalence theorem states: D-optimal solution for determining to be used in formingis the same that minimizes the maximum variance of predictor • Useful in practical determination of optimal
Variance Function as it Depends on Input: Optimal Asymptotic Design for Example 17.6 in ISSO
Orthogonal Designs • With linear models, usually more than one solution is D-optimal • Orthogonality is means of reducing number of solutions • Orthogonality also introduces desirable secondary properties • Separates effects of input factors (avoids “aliasing”) • Makes estimates for elements of uncorrelated • Orthogonal designs are not generally D-optimal; D-optimal designs are not generally orthogonal • However, some designs are both • Classical factorial (“cubic”) designs are orthogonal (and often D-optimal)
Example Orthogonal Designs, r = 2 Factors x x k 2 k 2 x x k 1 k 1 r design) design) Cube (2 Star (2r
xk2 xk2 xk1 xk1 xk3 Star (2r design) Cube (2r design) Example Orthogonal Designs, r = 3 Factors xk3
Response Surface Methodology (RSM) • Suppose want to determine inputs x that minimize the mean response z of some process (E(z)) • There are also other (nonoptimization) uses for RSM • RSM can be used to build local models with the aim of finding the optimal x • Based on building a sequence of local models as one moves through factor (x) space • Each response surface is typically a simple regression polynomial • Experimental design can be used to determine input values for building response surfaces
Steps of RSM for Optimizing x Step 0 (Initialization)Initial guess at optimal value of x. Step 1 (Collect data) Collect responses z from severalxvalues in neighborhood of current estimate of best x value (can use experimental design). Step 2 (Fit model) From the x, z pairs in step 1, fit regression model in region around current best estimate of optimal x. Step 3 (Identify steepest descent path) Based on response surface in step 2, estimate path of steepest descent in factor space. Step 4 (Follow steepest descent path) Perform series of experiments at xvalues along path of steepest descent until no additional improvement in z response is obtained. This x value represents new estimate of best vector of factor levels. Step 5 (Stop or return)Go to step 1 and repeat process until final best factor level is obtained.
Conceptual Illustration of RSM for Two Variables in x; Shows More Refined Experimental Design Near Solution Adapted from: Montgomery (2005), Design and Analysis of Experiments, Fig. 11-3
Nonlinear Design • Assume model z = h(,x) + v , where enters nonlinearly and x is r-dimensional input vector • D-optimality remains dominant measure • Maximization of determinant of Fisher information matrix (from Chapter 13 of ISSO: Fn(,X) is Fisher information matrix based on n inputs in n×r matrix X) • Fundamental distinction from linear case is that D-optimal criterion depends on • Leads to conundrum: Choosing X to best estimate , yet need to know to determineX
Strategies for Coping with Dependence on • Assume nominal value of and develop an optimal design based on this fixed value • Sequential design strategy based on an iterated design and model fitting process. • Bayesian strategy where a prior distribution is assigned to , reflecting uncertainty in the knowledge of the true value of
Sequential Approach for Parameter Estimation and Optimal Design Step 0 (Initialization) Make initial guess at , Allocate n0 measurements to initial design. Set k = 0 and n = 0. • Step 1 (D-optimal maximization)Given Xn, choose the nk inputs in X = to maximize • Step 2 (Update estimate)Collect nk measurements based on inputs from step 1. Use measurements to update from to • Step 3 (Stop or return)Stop if the value of in step 2 is satisfactory. Else return to step 1 with the new k set to the former k + 1 and the new n set to the former n + nk (updated Xn now includes inputs from step 1).
Comments on Sequential Design • Note two optimization problems being solved: one for , one for • Determine next nk input values (step 1) conditioned on current value of • Each step analogous to nonlinear design with fixed (nominal) value of • “Full sequential” mode (nk = 1) updates based on each new inputouput pair (xk, zk) • Can use stochastic approximation to update : where
Bayesian Design Strategy • Assume prior distribution (density) for , p(), reflecting uncertainty in the knowledge of the true value of . • There exist multiple versions of D-optimal criterion • One possible D-optimal criterion: • Above criterion related to Shannon information • While log transform makes no difference with fixed , it does affect integral-based solution • To simplify integral, may be useful to choose discrete prior p()
Appendix to Slides for Chapter 17: Factorial Design (not in ISSO) • Classical experimental design deals with linear models • Factorial design is most popular classical method • All r inputs (“factors”) changed at one time • Factorial design provides two key advantages over one-at-a-time changes: • Greater efficiency in extracting information from a given number of experiments • Ability to determine if there are interaction effects • Standard method is 2r factorial; “2” comes about by looking at each input at two levels: low () and high (+) • E.g., if r = 3, then have 23 = 8 input combinations: ( ), (+ ), ( + ), ( +), (++), (+ +), ( + +), (+ + +)
Appendix to Slides (cont’d): Factorial Design with 3 Inputs • Consider r = 3 linear model zk = 0+ 1xk1 + 2xk2 + 3xk3 + 4xk1xk2 + 5xk1xk3 + 6xk2xk3 + 7xk1xk2xk3 + noise, where = [0, 1,…, 7]T represents vector of (unknown) parameters and xki represents ith term in input vector xk • 23 factorial design allows for efficient estimation of all parameters in • In contrast, one-at-a-time provides no information for estimating 4 to7 • However, 23 factorial design must be augmented in some way if wish to add quadratic (e.g., ) or other higher-order polynomial terms to model
zk (++) Xk1= high (+) (+) Xk1= low () xk2 Appendix to Slides (cont’d): Illustration of Interaction with 2 Inputs • Example responses for r = 2: no interaction and interactionbetween input variables • Left plot (no interaction) shows that change in zkwith change inxk2does not depend onxk1; right plot (interaction) shows change in zkdoesdepend onxk1 No interaction Interaction zk Xk1= high (+) (+) Xk1= low () (++) xk2