1 / 21

Inference V: MCMC Methods

Inference V: MCMC Methods. Stochastic Sampling. In previous class, we examined methods that use independent samples to estimate P(X = x | e ) Problem: It is difficult to sample from P(X 1 , …. X n | e ) We had to use likelihood weighting to reweigh our samples

shiloh
Download Presentation

Inference V: MCMC Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inference V:MCMC Methods .

  2. Stochastic Sampling • In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem: It is difficult to sample from P(X1, …. Xn |e ) • We had to use likelihood weighting to reweigh our samples • This introduced bias in estimation • In some case, such as when the evidence is on leaves, these methods are inefficient

  3. MCMC Methods • We are going to discuss sampling methods that are based on Markov Chain • Markov Chain Monte Carlo (MCMC) methods • Key ideas: • Sampling process as a Markov Chain • Next sample depends on the previous one • These will approximate any posterior distribution • We start by reviewing key ideas from the theory of Markov chains

  4. ... ... Xn X1 X2 X3 Markov Chains • Suppose X1, X2, … take some set of values • wlog. These values are 1, 2, ... • A Markov chain is a process that corresponds to the network: • To quantify the chain, we need to specify • Initial probability: P(X1) • Transition probability: P(Xt+1|Xt) • A Markov chain has stationary transition probability • P(Xt+1|Xt) is the same for all times t

  5. Irreducible Chains • A state j is accessible from state i if there is an n such that P(Xn = j | X1 = i) > 0 • There is a positive probability of reaching j from i after some number steps • A chain is irreducible if every state is accessible from every state

  6. Ergodic Chains • A state is positively recurrent if there is a finite expected time to get back to state i after being in state i • If X has finite number of states, then this is suffices that i is accessible from itself • A chain is ergodic if it is irreducible and every state is positively recurrent

  7. (A)periodic Chains • A state i is periodic if there is an integer d such thatP(Xn = i | X1 = i ) = 0 when n is not divisible by d • A chain is aperiodic if it contains no periodic state

  8. Stationary Probabilities Thm: • If a chain is ergodic and aperiodic, then the limitexists, and does not depend on i • Moreover, letthen, P*(X) is the unique probability satisfying

  9. Stationary Probabilities • The probability P*(X) is the stationary probability of the process • Regardless of the starting point, the process will converge to this probability • The rate of convergence depends on properties of the transition probability

  10. Sampling from the stationary probability • This theory suggests how to sample from the stationary probability: • Set X1 = i, for some random/arbitrary i • For t = 1, 2, …, n • Sample a value xt+1 for Xt+1 from P(Xt+1|Xt=xt) • return xn • If n is large enough, then this is a sample from P*(X)

  11. Designing Markov Chains • How do we construct the right chain to sample from? • Ensuring aperiodicity and irreducibility is usually easy • Problem is ensuring the desired stationary probability

  12. Designing Markov Chains Key tool: • If the transition probability satisfiesthen, P*(X) = Q(X) • This gives a local criteria for checking that the chain will have the right stationary distribution

  13. MCMC Methods • We can use these results to sample from P(X1,…,Xn|e) Idea: • Construct an ergodic & aperiodic Markov Chain such that P*(X1,…,Xn) = P(X1,…,Xn|e) • Simulate the chain n steps to get a sample

  14. MCMC Methods Notes: • The Markov chain variable Y takes as value assignments to all variables that are consistent evidence • For simplicity, we will denote such a state using the vector of variables

  15. Gibbs Sampler • One of the simplest MCMC method • At each transition change the state of just on Xi • We can describe the transition probability as a stochastic procedure: • Input: a state x1,…,xn • Choose i at random (using uniform probability) • Sample x’i from P(Xi|x1, …, xi-1, xi+1 ,…, xn, e) • let x’j = xj for all j  i • return x’1,…,x’n

  16. Correctness of Gibbs Sampler • By chain rule P(x1, …, xi-1, xi, xi+1 ,…, xn|e) =P(x1, …, xi-1, xi+1 ,…, xn|e)P(xi|x1, …, xi-1, xi+1 ,…, xn, e) • Thus, we get • Since we choose i from the same distribution at each stage, this procedure satisfies the ratio criteria

  17. Gibbs Sampling for Bayesian Network • Why is the Gibbs sampler “easy” in BNs? • Recall that the Markov blanket of a variable separates it from the other variables in the network • P(Xi | X1,…,Xi-1,Xi+1,…,Xn) = P(Xi | Mbi ) • This property allows us to use local computations to perform sampling in each transition

  18. Gibbs Sampling in Bayesian Networks • How do we evaluate P(Xi | x1,…,xi-1,xi+1,…,xn) ? • Let Y1, …, Yk be the children of Xi • By definition of Mbi, the parents of Yj are in Mbi{Xi} • It is easy to show that

  19. Sampling Strategy • How do we collect the samples? Strategy I: • Run the chain M times, each run for N steps • each run starts from a different state points • Return the last state in each run M chains

  20. Sampling Strategy Strategy II: • Run one chain for a long time • After some “burn in” period, sample points every some fixed number of steps “burn in” M samples from one chain

  21. Comparing Strategies Strategy I: • Better chance of “covering” the space of pointsespecially if the chain is slow to reach stationarity • Have to perform “burn in” steps for each chain Strategy II: • Perform “burn in” only once • Samples might be correlated (although only weakly) Hybrid strategy: • run several chains, and sample few samples from each • Combines benefits of both strategies

More Related