150 likes | 332 Views
Stochastic Machines CS679 Lecture Note by Jin Hyung Kim Computer Science Department KAIST. Statistical Machine. Root at statistical mechanics derive thermodynamic properties of macroscopic bodies from microscopic elements probabilistic nature due to enormous degree of freedom
E N D
Stochastic Machines CS679 Lecture Note by Jin Hyung Kim Computer Science Department KAIST
Statistical Machine • Root at statistical mechanics • derive thermodynamic properties of macroscopic bodies from microscopic elements • probabilistic nature due to enormous degree of freedom • concept of entropy plays the central role • Gibbs distribution • Markov Chain • Metropolis algorithm • Simulated Annealing • Boltzman Machine • device for modeling the underlying probability distribution of data set
Statistical Mechanics • In thermal equilibrium, probability of state I • energy of state i • absolute temperature • Boltzman constant • In NN model
Markov Chain • Stochastic process of Markov property • state Xn+1 at time n+1depends only on state Xn • Transition probability & Stochastic matrix • m-step transition probability
Markov Chain • Recurrent state • P(ever returning to the state i) = 1 • Transient state • P(ever returning to the state i) < 1 • mean recurrence time of state i : Ti(k) • expectation of time elapsed between (k-1)th return to kth return • steady-state probability of state i, i • I = 1/(mean recurrence time) • ergodicity • long-term proportion of time spent in state i approaches to the steady-state probability
Convergence to stationary distribution • State distribution vector • starting from arbitrary initial distribution, transition prob will converge to stationary distribution for ergodic Markov chain • independent of initial distribution • example 11.1 and 11.2
Principle of detailed Balance • At thermal equilibrium, the rate of occurrence of any transition equals the corresponding rate of occurrence of the inverse transition ipij = jpji • Detailed Balance implies distribution i is stationary • Detailed Balance is sufficient condition for the thermal equilibrium
Metropolis algorithm • Modified Monte Carlo method • Suppose our objective is to reach the state minimizing energy function • 1. Randomly generate a new state, Y, from state X • 2. If E(energy difference between Y and X) < 0 then move to Y (set Y to X) and goto 1 • 3. Else • 3.1 select a random number, • 3.2 if < exp(- E / T) then move to Y (set Y to X) and goto 1 • 3.3 else goto 1
Metropolis algrthm and Markov Chain • choose probability distribution so that Markov chain converge to be a Gibbs distribution then where • Metropolis algorithm is equivalent to random step in stationary Markov chain • shown that such choice satisfied principle of detailed balance
Simulated Annealing • Solves combinatorial optimization • variant of Metropolis algorithm • by S. Kirkpatric (83) • finding minimum-energy solution of a neural network = finding low temperature state of physical system • To overcome local minimum problem • Key idea • Instead always going downhill, try to go downhill ‘most of the time’
Iterative + Statistical • Simple Iterative Algorithm (TSP) 1. find a path p 2. make p’, a variation of p 3. if p’ is better than p, keep p’ as p 4. goto 2 • Metropolis Algorithm • 3’ : if (p’ is better than p) or (random < Prob), then keep p’ as p • a kind of Monte Carlo method • Simulated Annealing • T is reduced as time passes
About T • Metropolis Algorithm • Prob = p(DE) = exp ( DE / T) • Simulated Annealing • Prob = pi(DE) = exp ( DE / Ti) • if Ti is reduced too fast, poor quality • if Tt >= T(0) / log(1+t) - Geman • System will converge to minimun configuration • Tt = k/1+t - Szu • Tt = a T(t-1) where a is in between 0.8 and 0.99
Function Simulated Annealing current select a node (initialize) fort 1 todo Tschedule[t] ifT=0 thenreturncurrent next a random selected successor of current E value[next] - value[current] ifE > 0 thencurrentnext elsecurrentnext only with probability eE /T
Gibbs Sampling • Generates Markov chain with Gibbs distribution as equilibrium distribution • Numerical estimate of the marginal density of RV Xk • with knowledge of conditional distribution of Xk given all the other component • x1(1) is drawn from distribution X1, given X2(0), X3(0),…,Xk(0) • x2(1) is drawn from distribution X2, given X2(1), X3(0),…,Xk(0) • ... • xk(1) is drawn from distribution X1, given X2(1), X3(1),…,Xk(0) • ... • xK(1) is drawn from distribution X1, given X2(1), X3(1),…,Xk(1) • Converge to true marginal prob Distr.
Gibbs Sampling • Convergence theorem