240 likes | 408 Views
Inference V: MCMC Methods. Stochastic Sampling. In previous class, we examined methods that use independent samples to estimate P(X = x | e ) Problem: It is difficult to sample from P(X 1 , …. X n | e ) We had to use likelihood weighting to reweigh our samples
E N D
Stochastic Sampling • In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem: It is difficult to sample from P(X1, …. Xn |e ) • We had to use likelihood weighting to reweigh our samples • This introduced bias in estimation • In some case, such as when the evidence is on leaves, these methods are inefficient
MCMC Methods • We are going to discuss sampling methods that are based on Markov Chain • Markov Chain Monte Carlo (MCMC) methods • Key ideas: • Sampling process as a Markov Chain • Next sample depends on the previous one • These will approximate any posterior distribution • We start by reviewing key ideas from the theory of Markov chains
... ... Xn X1 X2 X3 Markov Chains • Suppose X1, X2, … take some set of values • wlog. These values are 1, 2, ... • A Markov chain is a process that corresponds to the network: • To quantify the chain, we need to specify • Initial probability: P(X1) • Transition probability: P(Xt+1|Xt) • A Markov chain has stationary transition probability • P(Xt+1|Xt) is the same for all times t
Irreducible Chains • A state j is accessible from state i if there is an n such that P(Xn = j | X1 = i) > 0 • There is a positive probability of reaching j from i after some number steps • A chain is irreducible if every state is accessible from every state
Ergodic Chains • A state is positively recurrent if there is a finite expected time to get back to state i after being in state i • If X has finite number of states, then this is suffices that i is accessible from itself • A chain is ergodic if it is irreducible and every state is positively recurrent
(A)periodic Chains • A state i is periodic if there is an integer d such thatP(Xn = i | X1 = i ) = 0 when n is not divisible by d • A chain is aperiodic if it contains no periodic state
Stationary Probabilities Thm: • If a chain is ergodic and aperiodic, then the limitexists, and does not depend on i • Moreover, letthen, P*(X) is the unique probability satisfying
Stationary Probabilities • The probability P*(X) is the stationary probability of the process • Regardless of the starting point, the process will converge to this probability • The rate of convergence depends on properties of the transition probability
Sampling from the stationary probability • This theory suggests how to sample from the stationary probability: • Set X1 = i, for some random/arbitrary i • For t = 1, 2, …, n • Sample a value xt+1 for Xt+1 from P(Xt+1|Xt=xt) • return xn • If n is large enough, then this is a sample from P*(X)
Designing Markov Chains • How do we construct the right chain to sample from? • Ensuring aperiodicity and irreducibility is usually easy • Problem is ensuring the desired stationary probability
Designing Markov Chains Key tool: • If the transition probability satisfiesthen, P*(X) = Q(X) • This gives a local criteria for checking that the chain will have the right stationary distribution
MCMC Methods • We can use these results to sample from P(X1,…,Xn|e) Idea: • Construct an ergodic & aperiodic Markov Chain such that P*(X1,…,Xn) = P(X1,…,Xn|e) • Simulate the chain n steps to get a sample
MCMC Methods Notes: • The Markov chain variable Y takes as value assignments to all variables that are consistent evidence • For simplicity, we will denote such a state using the vector of variables
Gibbs Sampler • One of the simplest MCMC method • At each transition change the state of just on Xi • We can describe the transition probability as a stochastic procedure: • Input: a state x1,…,xn • Choose i at random (using uniform probability) • Sample x’i from P(Xi|x1, …, xi-1, xi+1 ,…, xn, e) • let x’j = xj for all j i • return x’1,…,x’n
Correctness of Gibbs Sampler • By chain rule P(x1, …, xi-1, xi, xi+1 ,…, xn|e) =P(x1, …, xi-1, xi+1 ,…, xn|e)P(xi|x1, …, xi-1, xi+1 ,…, xn, e) • Thus, we get • Since we choose i from the same distribution at each stage, this procedure satisfies the ratio criteria
Gibbs Sampling for Bayesian Network • Why is the Gibbs sampler “easy” in BNs? • Recall that the Markov blanket of a variable separates it from the other variables in the network • P(Xi | X1,…,Xi-1,Xi+1,…,Xn) = P(Xi | Mbi ) • This property allows us to use local computations to perform sampling in each transition
Gibbs Sampling in Bayesian Networks • How do we evaluate P(Xi | x1,…,xi-1,xi+1,…,xn) ? • Let Y1, …, Yk be the children of Xi • By definition of Mbi, the parents of Yj are in Mbi{Xi} • It is easy to show that
Sampling Strategy • How do we collect the samples? Strategy I: • Run the chain M times, each run for N steps • each run starts from a different state points • Return the last state in each run M chains
Sampling Strategy Strategy II: • Run one chain for a long time • After some “burn in” period, sample points every some fixed number of steps “burn in” M samples from one chain
Comparing Strategies Strategy I: • Better chance of “covering” the space of pointsespecially if the chain is slow to reach stationarity • Have to perform “burn in” steps for each chain Strategy II: • Perform “burn in” only once • Samples might be correlated (although only weakly) Hybrid strategy: • run several chains, and sample few samples from each • Combines benefits of both strategies