1 / 15

Stochastic Machines CS679 Lecture Note by Jin Hyung Kim Computer Science Department KAIST

Stochastic Machines CS679 Lecture Note by Jin Hyung Kim Computer Science Department KAIST. Statistical Machine. Root at statistical mechanics derive thermodynamic properties of macroscopic bodies from microscopic elements probabilistic nature due to enormous degree of freedom

Download Presentation

Stochastic Machines CS679 Lecture Note by Jin Hyung Kim Computer Science Department KAIST

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stochastic Machines CS679 Lecture Note by Jin Hyung Kim Computer Science Department KAIST

  2. Statistical Machine • Root at statistical mechanics • derive thermodynamic properties of macroscopic bodies from microscopic elements • probabilistic nature due to enormous degree of freedom • concept of entropy plays the central role • Gibbs distribution • Markov Chain • Metropolis algorithm • Simulated Annealing • Boltzman Machine • device for modeling the underlying probability distribution of data set

  3. Statistical Mechanics • In thermal equilibrium, probability of state I • energy of state i • absolute temperature • Boltzman constant • In NN model

  4. Markov Chain • Stochastic process of Markov property • state Xn+1 at time n+1depends only on state Xn • Transition probability & Stochastic matrix • m-step transition probability

  5. Markov Chain • Recurrent state • P(ever returning to the state i) = 1 • Transient state • P(ever returning to the state i) < 1 • mean recurrence time of state i : Ti(k) • expectation of time elapsed between (k-1)th return to kth return • steady-state probability of state i, i • I = 1/(mean recurrence time) • ergodicity • long-term proportion of time spent in state i approaches to the steady-state probability

  6. Convergence to stationary distribution • State distribution vector • starting from arbitrary initial distribution, transition prob will converge to stationary distribution for ergodic Markov chain • independent of initial distribution • example 11.1 and 11.2

  7. Principle of detailed Balance • At thermal equilibrium, the rate of occurrence of any transition equals the corresponding rate of occurrence of the inverse transition ipij = jpji • Detailed Balance implies distribution i is stationary • Detailed Balance is sufficient condition for the thermal equilibrium

  8. Metropolis algorithm • Modified Monte Carlo method • Suppose our objective is to reach the state minimizing energy function • 1. Randomly generate a new state, Y, from state X • 2. If E(energy difference between Y and X) < 0 then move to Y (set Y to X) and goto 1 • 3. Else • 3.1 select a random number,  • 3.2 if  < exp(- E / T) then move to Y (set Y to X) and goto 1 • 3.3 else goto 1

  9. Metropolis algrthm and Markov Chain • choose probability distribution so that Markov chain converge to be a Gibbs distribution then where • Metropolis algorithm is equivalent to random step in stationary Markov chain • shown that such choice satisfied principle of detailed balance

  10. Simulated Annealing • Solves combinatorial optimization • variant of Metropolis algorithm • by S. Kirkpatric (83) • finding minimum-energy solution of a neural network = finding low temperature state of physical system • To overcome local minimum problem • Key idea • Instead always going downhill, try to go downhill ‘most of the time’

  11. Iterative + Statistical • Simple Iterative Algorithm (TSP) 1. find a path p 2. make p’, a variation of p 3. if p’ is better than p, keep p’ as p 4. goto 2 • Metropolis Algorithm • 3’ : if (p’ is better than p) or (random < Prob), then keep p’ as p • a kind of Monte Carlo method • Simulated Annealing • T is reduced as time passes

  12. About T • Metropolis Algorithm • Prob = p(DE) = exp ( DE / T) • Simulated Annealing • Prob = pi(DE) = exp ( DE / Ti) • if Ti is reduced too fast, poor quality • if Tt >= T(0) / log(1+t) - Geman • System will converge to minimun configuration • Tt = k/1+t - Szu • Tt = a T(t-1) where a is in between 0.8 and 0.99

  13. Function Simulated Annealing current select a node (initialize) fort 1 todo Tschedule[t] ifT=0 thenreturncurrent next a random selected successor of current E  value[next] - value[current] ifE > 0 thencurrentnext elsecurrentnext only with probability eE /T

  14. Gibbs Sampling • Generates Markov chain with Gibbs distribution as equilibrium distribution • Numerical estimate of the marginal density of RV Xk • with knowledge of conditional distribution of Xk given all the other component • x1(1) is drawn from distribution X1, given X2(0), X3(0),…,Xk(0) • x2(1) is drawn from distribution X2, given X2(1), X3(0),…,Xk(0) • ... • xk(1) is drawn from distribution X1, given X2(1), X3(1),…,Xk(0) • ... • xK(1) is drawn from distribution X1, given X2(1), X3(1),…,Xk(1) • Converge to true marginal prob Distr.

  15. Gibbs Sampling • Convergence theorem

More Related