1 / 32

Topologically Adaptive Stochastic Search

TASS is a global optimization method that finds the global minimum inside a bounded domain by finding all local minima and selecting the global one. It incorporates a stochastic modification to estimate probabilities and adaptively update regions of attraction.

jcanada
Download Presentation

Topologically Adaptive Stochastic Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. THESSALONIKI IOANNINA ATHENS Topologically Adaptive Stochastic Search I.E. Lagaris & C. Voglis Department of Computer Science University of Ioannina - GREECE

  2. Global Optimization • The goal is to find the Global minimum (or minima) inside a bounded domain: • One way to do that, is to find all the local minima and choose among them the global one (or ones). • A popular method of that kind is the so called “Multistart”.

  3. Local Optimization • Let a point • Starting from x, a local search procedure L, reaches a minimum • This may be denoted as: Multistart applies repeatedly a local optimization procedure.

  4. Regions of Attraction • For a local search procedure L, the region of attraction of the minimum yi, is defined by: Observe the dependence on L.

  5. “IDEAL” MultiStart (IMS) • This is a version in which every local minimum is found only once. • It assumes that from the position of a minimum, its region of attraction may be directly determined. • Since this is a false assumption, IMS is of no practical value. • However it offers a framework and a target.

  6. Initialize:Set k=1 • Sample • Terminate if a stopping rule applies • Sample • Main Step: • Iterate: Go back to step 2. Ideal MultiStart (IMS)

  7. Making IMS practical • Since the regions of attraction of the minima discovered so far, are not known, it is not possible to determine if a point belongs or not to their union. • However, a probability may be estimated, based on several assumptions. • Hence, a stochastic modification may render IMS useful.

  8. Main Step: • Estimate the probability p, that • Apply a local search with probability p. • If Then • Endif Stochastic modification of the main step

  9. The probability estimation • Overestimated probability (p→1), increases the computational cost, and transforms the algorithm towards the standard MultiStart. • Underestimated probability will cause an iteration delay without significant computational cost. (Only sampling, no local search).

  10. Probability model • If a sample point is close to an already known minimizer, the probability that it does not belong to its region of attraction is small and zero at the limit of complete coincidence. • From the above follows that:

  11. Probability model • Let • If , Ri being a radius such that Ai is contained in the sphere (yi, Ri ), then certainly: • Hence

  12. Probability model Where and P3(z) is a cubic polynomial so that both are continuous.

  13. Defining the model parameters • There are three parameters to specify for each z. Namely: a, r, R. • All of them will depend on the associated minimum yi, and the iteration count (k), i.e. a=ai(k), r=ri(k), and R=Ri(k).

  14. Interpreting the model parameters • ri is the distance below which the probability is descending quadratically and depends on the size of the “valley”. • As the algorithm proceeds, yi may be discovered repeatedly. Every time it is rediscovered, ri is increased in order to adapt to the local geometry.

  15. Interpreting the model parameters • ai is the probability at zi=ri • As yi is being rediscovered, ai should be decreased to render a future rediscovery less probable. • If li is the number of times yi is being discovered so far, then we set:

  16. Choosing the model parameters • ri is being increased as: and is safeguarded by: • ηbeing the machine precision. • Ri is taken to be and is updated every time a local search rediscovers yi.

  17. Gradient Information • In the case where d=yi-x is descent, the probability is reduced by a factor pg[0,1]. • pg is zero when d is parallel to , and one when it is perpendicular to it. • Namely this factor is given by: and is used only when zi[0.7ri,0.9ri]

  18. Ascending Gradient Rule • If the direction is not descent at x, i.e. if it signals that x is not “attracted” towards yi, i.e. does not fall inside its region of attraction. In this case

  19. Asymptotic guaranty • The previous gradient rule, together with the model s(x) guarantee that asymptotically all minima will be found with probability one. • Hence the global minimum will surely be recovered asymptotically.

  20. Probability • Having estimated the probability we can estimate ideally as: However the product creates a problem illustrated next.

  21. The probability at x is reduced since it falls inside two spheres centered at yi and yj. Note that x will lead to a new minimum and ideally its probability should have been high. This is an effect that may be amplified in many dimensions. Local minimum not discovered yet

  22. Estimating the probability • To circumvent this problem we consider the following estimate: Where the index “cn” stands for Closest Neighbor. Namely we take in account only the closest minimizer.

  23. Local nature of the probability • The probability model is based on distances from the discovered minima. • It is implicitly assumed that the closer to a minimum a point is, the greater the probability that falls inside its RA. • This is not true for all local search procedures L.

  24. Local search properties The local search dictates the shape of the regions of attraction. • Regions of attraction should contain the minimum and be contiguous. • Ideally the regions of attraction should resemble the ones produced by a descent method with infinitesimal step. • So the local search should be carefully chosen.

  25. Desired local search Simplex, with small initial opening

  26. Undesired local search BFGS with strong Wolfe line search

  27. Rastrigin Ackley http://www.geatbx.com/docu/fcnindex-msh_f8_8-21.gif Griewangk Shubert

  28. Rotated Quadratics This test function is constructed  so that its contours form non-convex domains. C. Voglis, private communication

  29. Preliminary results

  30. Parallel processing • The described process uses a single sample point and performs a local search with a probability. • If many points are sampled, multiple local searches may be performed in parallel, gaining so significantly in performance.

  31. Parallel processing gain • Note however that the probability estimation will be based on data that are updated in batches. • This update delay is significant in the first few rounds only. • A further gain may be possible using a clustering technique before the local search is applied.

  32. Sample M points Estimate the probability to start a local search (LS). Decide from which points a LS will start. Apply to these points a clustering technique and decide to start a LS from only one point of each cluster. Send the selected points to the available processors that will perform the LS. Clustering filter

More Related