Optimizing with Simulated Annealing & Boltzmann Machines

Simulated Annealing & Boltzmann Machines 虞台文大同大學資工所智慧型多媒體研究室

Content • Overview • Simulated Annealing • Deterministic Annealing • Boltzmann Machines

Simulated Annealing & Boltzmann Machines Overview 大同大學資工所智慧型多媒體研究室

E > 0 E < 0 Hill Climbing E E: cost (energy)

The Problem with Hill Climbing • Gets stuck at local minima • Gradient decent approach • Hopfield neural networks • Possible solutions • Try different initial states • Increase the size of the neighborhood (e.g. in TSP try 3-opt rather than 2-opt)

Goal: escape from local-minima. Stochastic Approaches • Stochastic optimization refers to the minimization (or maximization) of a function in the presence of randomness in the optimization process. • The randomness may be present as either noise in measurements or Monte Carlo randomness in the search procedure, or both.

Two Important Methods • Simulated Annealing (SA) • Motivated by the physical annealing process • Evolution from a single solution • Genetic Algorithms (GA) • Motivated by the evolution process of biology • Evolution from multiple solutions

Two Important Methods • Simulated Annealing (SA) • Motivated by the physical annealing process • Evolution from a single solution • Genetic Algorithms (GA) • Motivated by the evolution process of biology • Evolution from multiple solutions Kirkpatrick, S , Gelatt, C.D., Vecchi, M.P. 1983. “Optimization by Simulated Annealing.” Science, vol 220, No. 4598, pp 671-680. J. Holland, Adaptation in Natural and Artificial Systems, University of Michigan Press, 1975.

Simulated Annealing & Boltzmann Machines Simulated Annealing 大同大學資工所智慧型多媒體研究室

Global Optimization

 +   + + +     + +  +  + +  + Statistical Mechanics in a Nutshell T • Statistical mechanics is the study of the behavior of very large systems of interacting components in thermal equilibrium at a temperature, say T.

 +   + + +     + +  +  + +  + Boltzmann Factor T kB : Boltzmann constant Z(T) : Boltzmann partition function

 +   + + +     + +  +  + +  + • Raising temperature • the system becomes more `active’ • the average energy becomes higher Boltzmann Factor T T1 < T2 < T3 E

E > 0 E < 0 SimulationMetropolis Acceptance Criterion E E: cost (energy)

1 SimulationMetropolis Acceptance Criterion T1 < T2 < T3

Simulated Annealing Algorithm • Create initial solution S • Initialize temperature T • repeat • for k = 1 to iteration-length do • Generate a random transition from S to S’ • Let E = E(S’)  E(S) • if E < 0 then S = S’ • else if exp[E/T] > rand(0,1)thenS = S’ • Reduce temperature T • until no change in E(S) • Return S

Hill Climbing Simulated Annealing Algorithm • Create initial solution S • Initialize temperature T • repeat • for k = 1 to iteration-length do • Generate a random transition from S to S’ • Let E = E(S’)  E(S) • if E < 0 then S = S’ • else if exp[E/T] > rand(0,1)thenS = S’ • Reduce temperature T • until no change in E(S) • Return S

Main Components of SA • Solution representation • Appropriate for computing energy (cost) • Transition mechanism between solutions • Incremental changes of solutions • Cooling schedule • Initial system temperature • Temperature decrement function • Number of iterations between temperature change • Acceptance criteria • Stop criteria

Example Given n-city locations specified in a two-dimensional space, find the minimum tour length. The salesman must visit each and every city only once and should return to the starting city forming a closed path. Traveling Salesman Problem

Example Traveling Salesman Problem

2 3 1 4 5 ∞ 6 9 11 10 8 7 Solution Representation (TSP) Assume cities are fully connected with symmetric distance.

2 3 1 4 5 6 9 11 10 8 7 Solution Representation (TSP) 1 2 3 4 6 5 7 9 11 8 10 >

2 3 1 4 5 6 9 11 10 8 7 Energy (Cost) Computation (TSP) d10,1 1 2 3 4 6 5 7 9 11 8 10 d23 d12 d12 d23 d34 d46 d65 d57 d79 d9.11 d11,8 d8,10 > d34 d46 d10,1 d9,11 d65 d11,8 d57 d79 d8,10

d10,1 1 2 3 4 5 6 7 8 9 10 d12 d23 d34 d45 d56 d67 d78 d89 d9,10 1 10 2 3 9 8 4 7 5 6 State Transition (TSP) 1. Randomly select two edges

d10,1 d34 d89 1 10 2 3 9 8 4 7 5 6 State Transition (TSP) 1 2 3 4 5 6 7 8 9 10 d12 d23 d45 d56 d67 d78 d9,10 1. Randomly select two edges 2. Swap the path

d10,1 d38 d87 d76 d65 d54 d49 State Transition (TSP) 1 2 3 4 5 6 7 8 9 10 8 7 6 5 4 d12 d23 d9,10 1 10 2 1. Randomly select two edges 3 9 2. Swap the path 8 4 7 5 6

Cooling Schedules Geometric Schedule Empirical evidence shows that typically 0.8    0.99 yields successful applications (fairly slow cooling schedules).

100 cities are randomly chosen from 1010 square. 100-city TSP Simulation

100 cities are randomly chosen from 1010 square. 100-city TSP 1000N iterations are made for each test. Simulation Each temperature T is hold for 100Nreconfigurations or 10Nsuccessful reconfigurations, whichever comes first. T is reduced by 10% each time.

100 cities are randomly chosen from 1010 square. 100-city TSP Simulation

Simulated Annealing & Boltzmann Machines Deterministic Annealing 大同大學資工所智慧型多媒體研究室

The Problems of SA • SA techniques are inherently slow because of their randomized local search strategy. • Converge to global optimum in probability one sense only if the cooling schedule is in the order of

The Problems of SA Geman, S. & Geman, D. (1984) “Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images,” IEEE Trans. on Pattern Analysis and Machine Intelligence6, 721-741. • SA techniques are inherently slow because of their randomized local search strategy. • Converge to global optimum in probability one sense only if the cooling schedule is in the order of Geman and Geman [1984]

Review Simulated Annealing Algorithm • Create initial solution S{0, 1}n • Initialize temperature T • repeat • for k = 1 to iteration-length do • Generate a random transition from S to S’ by inverting a random bitsi • Let E = E(S’)  E(S) • if E < 0 then S = S’ • else if exp[E/T] > rand(0,1)thenS = S’ • Reduce temperature T • until no change in E(S) • Return S

Review Simulated Annealing Algorithm • Create initial solution S{0, 1}n • Initialize temperature T • repeat • for k = 1 to iteration-length do • Generate a random transition from S to S’ by inverting a random bitsi • Let E = E(S’)  E(S) • if E < 0 then S = S’ • else if exp[E/T] > rand(0,1)thenS = S’ • Reduce temperature T • until no change in E(S) • Return S Stochastic nature

1 T1 < T2 < T3 Deterministic behavior Also called mean-field annealing. Deterministic Annealing (DA) • Create initial solution S[0, 1]n • Initialize temperature T • repeat • for k = 1 to iteration-length do • Choose a random bitsi • Reduce temperature T • until convergence criterion met • Return S

Simulated Annealing & Boltzmann Machines Boltzmann Machine 大同大學資工所智慧型多媒體研究室

Boltzmann Machines Discrete Hopfield NN Boltzmann Machine + Simulated Annealing

Update Rules • Discrete Hopfield NN • Boltzmann Machine Unipolar neuron

T=0 T=1 T=2 T=3 T= Cooling schedule is required. Update Rules • Discrete Hopfield NN • Boltzmann Machine Unipolar neuron

Exercises • Computer Simulations on the same TSP problem demonstrated previously using • Simulated Annealing • Deterministic Annealing, and • Boltzmann Machine. • Perform some analyses on your results.

Optimizing with Simulated Annealing & Boltzmann Machines