1 / 20

Organization of chapter in ISSO Background Simulation-based optimization vs. model building

Slides for Introduction to Stochastic Search and Optimization ( ISSO ) by J. C. Spall. CHAPTER 14 S IMULATION - B ASED O PTIMIZATION I : R EGENERATION , C OMMON R ANDOM N UMBERS , AND R ELATED M ETHODS. Organization of chapter in ISSO Background

hubert
Download Presentation

Organization of chapter in ISSO Background Simulation-based optimization vs. model building

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Slides for Introduction to Stochastic Search and Optimization (ISSO)by J. C. Spall CHAPTER 14SIMULATION-BASED OPTIMIZATIONI: REGENERATION, COMMON RANDOM NUMBERS, ANDRELATED METHODS Organization of chapter in ISSO Background Simulation-based optimization vs. model building Regenerative processes Special structure for loss estimation and optimization FDSA and SPSA in simulation-based optimization Improved convergence through common random numbers Discrete optimization via statistical selection

  2. Background: Simulation-Based Optimization • Optimization arises in two ways in simulation: • Building simulation model (parameter estimation) • Using simulation for optimization of real system given that problem A has been solved • Focus here is problem B • Fundamental goal is to optimize design vector  in real system; simulation is proxy in optimization process • Loss function to be minimized L() represents average system performance at given ; simulation runs produce noisy (approximate) value of L() • Appropriate stochastic optimization method yields “intelligent” trial-and-error in choice of how to run simulation to find best 

  3. Background (cont’d) • Many modern processes are studied by Monte Carlo simulation (manufacturing, defense, epidemiological, transportation, etc.) • Loss functions for such systems typically have form where Q(•) represents a function describing output of process based on Monte Carlo random effects in V • Simulation produces sample replications of Q(,V) (typically one simulation produces one value of Q(•)) • Examples of Q(•) might be defective products in manufacturing process, inaccuracy of weapon system, disease incidence in particular population, cumulative vehicle wait time at traffic signals, etc.

  4. Background (cont’d) • Important assumption is that simulation is faithful representation of true system • Recall that overall goal is to find  that minimizes mean value of Q(,V) • Equivalent to optimizing average performance of true system • Simulation-based optimization rests critically on simulation and true system being statistically equivalent • As with earlier chapters, need optimization method to cope with noise in input information • Noisy measurements of loss function and/or gradient of loss function • Focus in this chapter is simulation-based optimization without direct (noisy or noise-free) gradient information

  5. Comments on Gradient-Based and Gradient-Free Methods • In complex simulations, L/q (for use in deterministic optimization such as steepest descent) or Q/q (for use in stochastic gradient search [Chap. 5]) often not available • “Automatic differentiation” techniques (e.g., Griewank and Corliss, 1991) also usually infeasible due to software and storage requirements • Optimize q by using simulations to produce Q(,V) for varying q and V • Unlike Q/q (and E[Q(,V)]), Q(,V) is available in even the most complex simulations • Can use gradient-free optimization that allows for noisy loss measurements (since Q(,V)  E[Q(,V)] = L(), i.e., Q(,V) = L() + noise) • Appropriate stochastic approximation methods (e.g., FDSA, SPSA, etc.) may be used based on measurements Q(,V)

  6. Regenerative Systems • Common issue in simulation of dynamic systems is choice of amount of time to be represented • Regeneration is useful for addressing issue • Regenerative systems have property of returning periodically to some particular probabilistic state; system effectively starts anew with each period • Queuing systems are common examples • Day-to-day traffic flow; inventory control; communications networks; etc. • Advantage is that regeneration periods may be considered i.i.d. random processes • Typical loss has form:

  7. Queuing System with Regeneration; Periods Begin with Arrivals 1,3,4,7,11,16 (Example 14.2 in ISSO)

  8. Care Needed in Loss Estimators for Optimization of Regenerative Systems • Optimization of  commonly based on unbiased estimators of L() and/or gradient • Straightforward estimator of L() is • Above estimator is biased in general (i.e., ) • Biasedness follows from relationship for positive random variable X • not acceptable estimator of L() in general • Special cases may eliminate or minimize bias (e.g., when length of period is deterministic; see Sect. 14.2 of ISSO) • For such special cases, is acceptable estimator for use in optimization

  9. FDSA and SPSA in Simulation-Based Optimization • Stochastic approximation provides ideal framework for carrying out simulation-based optimization • Rigorous means for handling noisy loss information inherent in Monte Carlo simulation: y() = Q(,V) = L() + noise • Most other optimization methods (GAs, nonlinear programming, etc) apply only on ad hoc basis • “…FDSA, or some variant of it, remains the method of choice for the majority of practitioners” (Fu and Hu, 1997) • No need to know “inner workings” of simulation, as in gradient-based methods such as IPA, LR/SF, etc. • FDSA and SPSA-type methods much easier to use than gradient-based method as they only require simulation inputs/outputs

  10. Common Random Numbers • Common random numbers (CRNs) provide a way for improving simulation-based optimization by reusing the Monte-Carlo-generated random variables • CRNs based on the famous formula for two random variables X, Y: var(X Y) = var(X) + var(Y)  2cov(X,Y) • Maximizing the covariance minimizes the variance of the difference • The aim of CRNs is to reduce variability of the gradient estimate • Improves convergence in algorithm

  11. CRNs (cont’d) • For SPSA, the gradient variability is largelydriven by the numerator • Two effects contribute to variability:(i) difference due to perturbations (desirable)(ii) difference due to noise effects in measurements (undesirable) • CRNs useful for reducing undesirable variability in (ii) • Using CRNs maximizes covariance between two y() values in numerator • Minimizes variance of difference

  12. CRNs (cont’d) • In simulation (vs. most real systems) some form of CRNs is often feasible • The essence of CRN is to use same random numbers in both and • Achieved by using same random number seed for both simulations and synchronizing the random numbers • Optimal rate of convergence of iterate to (à la k–/2 ) is k–1/2(Kleinman et al., 1999); this rate is same as stochastic gradient-based method • Rate is improvement on optimal non-CRN rate of k–1/3 • Unfortunately, “pure CRN” may not be feasible in large-scale simulations due to violating synchronization requirement • e.g., if  represents service rates in a queuing system, difference between and may allow additional (stochastic) arrivals to be serviced in one case

  13. Numerical Illustration (Example 14.8 in ISSO) • Simulation using exponentially distributed random variables and loss function with p = dim() = 10 • Goal is to compare CRN and non-CRN •  is minimizing value for L() • Table below shows improved accuracy of solution under CRNs; plot on next slide compares rate of convergence

  14. Rates of Convergence for CRN and Non-CRN (Example 14.9 in ISSO) 7.0 6.0 5.0 Non-CRN, =1 Mean Values of 4.0 3.0 2.0 CRN, =1 1.0 Non-CRN, =2/3 0 10,000 100,000 1000 100 n (log scale)

  15. Partial CRNs • By using the same random number seed for and it is possible to achieve a partial CRN • Some of the events in the simulations will be synchronized due to common seed • Synchronization is likely to break down during course of simulation, especially for small k when ck is relatively large • Asymptotic analysis produces convergence rate identical to pure CRN since synchronization occurs as ck 0 • Also require new seed for simulations at each iteration (common for both y(•) values) to ensure convergence tominL() = minE[Q(,V)]) • In partial CRN,practical finite sample rate of convergence for SPSA tends to be lower than in pure CRN setting

  16. Numerical Example: Partial CRNs(Kleinman et al., 1999; see p. 398 of ISSO) • A simulation using exponentially distributed random variables was conducted in Kleinman, et al. (1999) forp = 10 • Simulation designed so that it is possible to implement pure CRN (not available in most practical simulations) • Purpose is to evaluate relative performance of non-CRN, partial CRN, and pure CRN

  17. Numerical Example (cont’d) • Numerical Results for 100 replications of SPSA and FDSA (no. of y(•) measurements in SPSA and FDSA are equal with total iterations of 10000 and 1000 respectively): • 0.0190 • 0.0071 • 0.0065 • 0.0410 • 0.0110 • 0.0064 • Non-CRN • Partial CRN • Pure CRN • Partial CRN offers significant improvement over non-CRN and SPSA outperforms FDSA (except in idealized pure CRN case)

  18. Indifference Zone Methods for Choosing Best Option • Consider use of simulation to determine the best of K possible options, represented 1, 2,…, K • Simulation produces noisy loss measurements yk(i) • Other methods for discrete optimization (e.g., random search, simulated annealing, genetic algorithms, etc.) generally inappropriate • Suppose analyst is willing to accept any i such that L(i) is in indifference zone[L(), L()+) • Analyst can specify  such that P(correct selection)  1 whenever L(i)L()  for all i • Can use independent sampling or common random numbers (steps for independent sampling on next slide)

  19. Two-Stage Indifference Zone Selection withIndependent Sampling Step 0 (initialization) Choose , , and initial sample size n0. • Step 1 (first stage)Run simulationn0times at eachi. • Step 2 (variance estimation) Compute sample variance at eachi. • Step 3 (sample sizes)Using above variance estimates and table look-up, compute the total sample size ni at eachi. • Step 4 (second stage)Run simulationni– n0 additional times at eachi. • Step 5 (sample means)Compute sample means of simulation outputs at eachiover allniruns. • Step 6 (decision step)Select the i corresponding to the lowest sample mean from step 5.

  20. Two-Stage Indifference Zone Selection withCRN (Dependent) Sampling Step 0 (initialization) Choose , , and initial sample size n0. • Step 1 (first stage) Run simulation n0 times at each i. The kth simulation runs for the i aredependent. • Step 2 (variance estimation) Compute overall sample variance for Kn0 runs. • Step 3 (sample sizes)Using above variance estimate and table look-up, compute total sample size n; n applies for all i. • Step 4 (second stage)Run simulation n– n0additional times at each i. • Step 5 (sample means)Compute sample means of simulation outputs at eachiover all n runs. • Step 6 (decision step)Select the i corresponding to the lowest sample mean from step 5.

More Related