460 likes | 474 Views
This paper discusses the use of Stochastic Relaxation Simulating Annealing to escape local minima and find global minimizers. It explores different types of relaxation methods and variable by variable relaxation. The Metropolis Algorithm is explained in detail. The paper also discusses the application of Stochastic Relaxation Simulating Annealing to the 2D Ising model and the bisectioning problem. The concept of Lowest Common Configuration is introduced to improve the global minimization process. Finally, the paper explores the challenges of heating and cooling scheduling and suggests the use of memory to optimize the process.
E N D
Stochastic Relaxation,Simulating Annealing,Global Minimizers
Different types of relaxation • Variable by variable relaxation – strict minimization • Changing a small subset of variables simultaneously – Window strict minimization relaxation • Stochastic relaxation – may increase the energy – should be followed by strict minimization
How to escape local minima? • First go uphill, then may hit a lower basin • In order to go uphill should allow increase in E(x) • Add stochasticity: allow E(x) to increase with probability which is governed by an external temperature-like parameter T The Metropolis Algorithm (Kirpartick et al. 1983) Assume xold is the current state, define xnewto be a neighboring state and delE=E(xnew)-E(xold) then IfdelE<0 replace xold by xnew else choose xnew with probability P(xnew)= and xoldwith probability P(xold)=1- P(xnew)
The Metropolis Algorithm • As T 0 and when delE>0 : P(xnew) 0 • At T=0: strict minimization • High T randomizes the configuration away from the minimum • Low T cannot escape local minima • Starting from a high T, the slower T is decreased the lower E(x) is achieved • The slow reduction in T allows the material to obtain a more arranged configuration: increase the size of its crystals and reduce their defects
SA for the 2D IsingE=-Sijsisj , i and j are nearest neighbors + + + + Eold=-2
SA for the 2D IsingE=-Sijsisj , i and j are nearest neighbors + + + + + + + Eold=-2 Enew=2
SA for the 2D IsingE=-Sijsisj , i and j are nearest neighbors + + + + + + + Eold=-2 Enew=2 delE=Enew- Eold=4>0 P(Enew)=exp(-4/T)
SA for the 2D IsingE=-Sijsisj , i and j are nearest neighbors + + + + + + + Eold=-2 Enew=2 delE=Enew- Eold=4>0 P(Enew)=exp(-4/T) =0.3 => T=-4/ln0.3 ~ 3.3 Reduce T by a factor a, 0<a<1: Tn+1=aTn
Exc#7: SA for the 2D Ising (see Exc#1) Consider the following cases: 1. For h1= h2=0 set a stripe of width 3,6 or 12 with opposite sign 2. For h1=-0.1, h2=0.4 set -1 at h1 and +1 at h2 3. Repeat 2. with 2 squares of 8x8 plus spins with h2=0.4 located apart from each other Calculate T0 to allow 10% flips of a spin surrounded by 4 neighbors of the same sign Use faster / slower cooling scheduling a. What was the starting T0 , E in each case b. How was T0 decreased, how many sweeps were employed c. What was the final configuration, was the global minimum achievable? If not try different T0 d. Is it harder to flip a wider stripe? e. Is it harder to flip 2 squares than just one?
SA for the bisectioning problem “individual” temperature R R’ i The probability of i to belong to R’ depends on Si = Sj in R’ aij / S aij P(i in R’)={ 1 delE<=0 exp[-delE/(TSi)] delE> 0
SA for the bisectioning problem “individual” temperature R R’ i The probability of i to belong to R’ should increase if a bigger change along the cut line is made If delE is small enough it is expected that further moves will indeed eventually produce a lower E
SA for the bisectioning problem how to choose T R R’ i • Calculate delE/Si along the cut line and sort them • Decide upon the % of changes desired • Find the appropriate T by demanding P(%)=0.5
SA for the linear ordering problems: multiple choices for a variable • Try to move node i up to k moves to the right and to the left: choose between the 2k+1 possibilities • For j=-k,..,-1,1,..,k , P(j)=z min[1 , exp(-delE(j)/T(j))] • For k=0: P(0)=z minj[1 - P(j)/z] • z is calculated from the normalizationSj P(j)=1 • T(j) is calculating apriori for each j aiming at a certain acceptance rate (e.g. 60%)
The Metropolis Algorithm (cont.) • May result in a very slow processing • Still, SA is considered to be a powerful global minimizer • Instead of very slow cooling schedule, repeat heating-cooling several times
Heating-cooling scheduling T #relaxation sweeps
The Metropolis Algorithm (cont.) • May result in a very slow processing • Still, SA is considered to be a powerful global minimizer • Instead of very slow cooling schedule, repeat heating-cooling several times and keep track of the best-so-far configuration: • The best-so-far has a non-increasing E • It is an outside observer • The best-so-far is actually the calculated minimum
Heating-cooling scheduling T #relaxation sweeps Store the best-so-far
The Metropolis Algorithm (cont.) • May result in a very slow processing • Still, SA is considered to be a powerful global minimizer • Instead of very slow cooling schedule, repeat heating-cooling several times and keep track of the best-so-far configuration: • The best-so-far has a non-increasing E • It is an outside observer • The best-so-far is actually the calculated minimum • Problem: heating may destroy already achieved minima in various subregions • Add “memory” of the best-so-far for those subregions
Lowest Common Configuration The global minimum
Lowest Common Configuration C1 The global minimum
Lowest Common Configuration C1 C2 The global minimum
Lowest Common Configuration C1 C2 The global minimum E(LCC(C1, C2))<= min[E(C1), E(C2)] LCC(C1, C2)
Heating-cooling scheduling T #relaxation sweeps Apply LCC
Heating-cooling scheduling T #relaxation sweeps best-so-far LCC (best-so-far , the new T=0)
Exc#8: LCC for the bisectioning problem R R’ i Given 2 partitions, find a linear time algorithm for the construction of their LCC
Exc#8*:LCC for linear ordering problems Find a (nearly) linear time algorithm (e.g. sorting is allowed) for the LCC of 2 permutations, in which subpermutations are detected and chosen into the best-so-far
Multilevel Simulated Annealing • Do not increase T by much: avoid destroying the global solution inherited from the coarser levels • Reduce T quickly: typically 2-3 values of T>0 (followed by strict minimization) are sufficient • Repeat heating-cooling several times per level • Accumulate the minimal solution into the best-so-far by applying the LCC at the end of T=0 • Interpolate the best-so-far to the next level
Genetic algorithm • A global search technique inspired by evolutionary biology • Start from a population of individuals (randomly generated) – this is the 1st generation • The next generation follows by: 1. selection of individuals from the current generation to breed the next generation according to some fitness measure 2. crossover (recombination) of pair of (randomly chosen) parents to produce an offspring 3. mutations are applied randomly to enhance the diversity of the individuals in the generation
A genetic algorithm for the linear arrangement problem P=1 • Initial population: 1. select a starting vertex 2. built the permutation by the greedy frontal increase minimization algorithm
A genetic algorithm for the linear arrangement problem P=1 • Initial population: 1. select a starting vertex 2. built the permutation by the greedy frontal increase minimization algorithm Fi Choose a node from Fi The one which is mostly connected to the already placed nodes
A genetic algorithm for the linear arrangement problem P=1 • Initial population: 1. select a starting vertex 2. built the permutation by the greedy frontal increase minimization algorithm • Selection of survivals is based on the E(x) • Recombinate two randomly chosen parents: Parent 1: 5 7 2 3 8 6 9 1 4 Parent 2: 6 1 2 4 3 5 9 8 7
A genetic algorithm for the linear arrangement problem P=1 • Initial population: 1. select a starting vertex 2. built the permutation by the greedy frontal increase minimization algorithm • Selection of survivals is based on the E(x) • Recombinate two randomly chosen parents: Parent 1: 5 7 2 3 8 6 9 1 4 Parent 2: 6 1 2 4 3 5 9 8 7 Offspring:2 9
A genetic algorithm for the linear arrangement problem P=1 • Initial population: 1. select a starting vertex 2. built the permutation by the greedy frontal increase minimization algorithm • Selection of survivals is based on the E(x) • Recombinate two randomly chosen parents: Parent 1: 5 7 2 3 8 6 9 1 4 Parent 2: 6 1 2 4 3 5 9 8 7 Offspring:2 9 _ 326 _ 795 _
A genetic algorithm for the linear arrangement problem P=1 • Initial population: 1. select a starting vertex 2. built the permutation by the greedy frontal increase minimization algorithm • Selection of survivals is based on the E(x) • Recombinate two randomly chosen parents: Parent 1: 5 7 2 3 8 6 9 1 4 Parent 2: 6 1 2 4 3 5 9 8 7 Offspring:2 9 _ 32 6 _ 795 _ _ 32 687954
A genetic algorithm for the linear arrangement problem P=1 • Initial population: 1. select a starting vertex 2. built the permutation by the greedy frontal increase minimization algorithm • Selection of survivals is based on the E(x) • Recombinate two randomly chosen parents: Parent 1: 5 7 2 3 8 6 9 1 4 Parent 2: 6 1 2 4 3 5 9 8 7 Offspring:2 9 _ 32 6 _ 795 _ _ 32 687954 1 32 687954
A genetic algorithm for the linear arrangement problem P=1 • Initial population: 1. select a starting vertex 2. built the permutation by the greedy frontal increase minimization algorithm • Selection of survivals is based on the E(x) The next generation is constructed by: 1. Recombinations of 2 randomly chosen parents 2. Improving the E(x) of the offspring by local processing, e.g. by Simulated Annealing 3. Choose the best individuals from the pool of parents and children
Spectral Sequencing: a global minimizer • Given a weighted graph where wij is the edge weight between the nodes i and j • Define the graph Laplacian A to be aij = -wij aii = Sjwij • A is symmetric semipositive definite • Consider the eigenvalue problem Ax=lx • Arranging the nodes of the graph according to the eigenvector associated with the 2nd smallest eigenvalue has been shown by Hall (1970) to be the solution to the problem minSjwij(xi - xj)2 for real variables x
Spectral Sequencing: a global minimizer • SS has been used extensively to solve a large variety of ordering problems: • Linear ordering problem: P=1,2, • Partitioning problems • Embedding to lower dimensions, etc. • To calculate the eigenvectors use multilevel • The direct use of multilevel to solve the original problem produces better results than using the ordering dictated by SS
P=2: Multilevel approach vs. Spectral method ratio graphs The results of the multilevel approach were obtained without post-processing! Ilya Safro, Dorit Ron, A. Brandt: J. Graph Alg. Appl. 10 (2006) 237-258