340 likes | 357 Views
Learn about Markov chain Monte Carlo (MCMC) algorithms for high-dimensional sampling from arbitrary probability density functions, including Metropolis-Hastings and Gibbs sampler techniques. Understand concepts like recurrency, ergodicity, and detailed balance.
E N D
Advanced Statistical ComputingFall 2016 Steve Qin
Markov Chain Monte Carlo • The goal is to generates sequence of random samples from an arbitrary probability density function (usually high dimensional), • The sequence of random samples form a Markov chain, • The purpose is simulation (Monte Carlo).
Motivation • Generate iid r.v. from high-dimensional arbitrary distributions is extremely difficult. • Drop the “independent” requirement. • How about also drop the “identically distributed” requirement?
Markov chain • Assume a finite state, discrete Markov chain with N different states. • Random process Xn, n = 0,1,2,… • Markov property, • Time-homogeneous • Order • Future state depends on the past m states.
Key parameters • Transition matrix • Initial probability distribution π(0) • Stationary distribution (invariant/equilibrium)
Reducibility • A state j is accessible from state i (written i j) if • A Markov chain is irreducible if it is possible to get to any state from any state.
Recurrence • A state i is transient if given that we start in state i, there is a non-zero probability that we will never return to i. State i is recurrent if it is not transient.
Ergodicity • A state i is ergodic if it is aperiodic and positive recurrent. If all states in an irreducible Markov chain are ergodic, the chain is ergodic.
Reversible Markov chains • Consider an ergodic Markov chain that converges to an invariant distribution π. A Markov chain is reversible if for all x, y ϵS, which is known as the detailed balance equation. • An ergodic chain in equilibrium and satisfying the detailed balance condition has πas its unique stationary distribution.
Markov Chain Monte Carlo • The goal is to generates sequence of random samples from an arbitrary probability density function (usually high dimensional), • The sequence of random samples form a Markov chain, in Markov chain, P π in MCMC, π P.
Bayesian Inference Genotype Haplotype Frequency Prior
History • Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. and Teller, E. (1953). Equation of state calculations by fast computing machines. Journal of Chemical Physics, 21, 1087–1092.
Metropolis algorithm • Direct sampling from the target distribution is difficult, • Generating candidate draws from a proposal distribution, • These draws then “corrected” so that asymptotically, they can be viewed as random samples from the desired target distribution.
Pseudo code • Initialize X0, • Repeat • Sample Y ~ q(x,.), • Sample U ~ Uniform (0,1), • If U ≤ α(X,Y), set Xi = y, • Otherwise Xi = x.
An informal derivation • Find α(X,Y): • Joint density of current Markov chain state and the proposal is g(x,y) = q(x,y)π(x) • Suppose q satisfies detail balance q(x,y) π(x) = q(y,x)π(y) • If q(x,y) π(x) > q(y,x)π(y), introduce α(x,y) < 1 and α(y,x) =1 hence • If q(x,y) π(x) > q(y,x)π(y), … • The probability of acceptance is
Remarks • Relies only on calculation of target pdf up to a normalizing constant.
Remarks • How to choose a good proposal function is crucial. • Sometimes tuning is needed. • Rule of thumb: 30% acceptance rate • Convergence is slow for high dimensional cases.
Illustration of Metropolis-Hastings • Suppose we try to sample from a bi-variate normal distributions. • Start from (0, 0) • Proposed move at each step is a two dimensional random walk
Illustration of Metropolis-Hastings • At each step, calculate since
Convergence • Trace plot Good
Convergence • Trace plot Bad
Convergence • Trace plot
Convergence • Autocorrelation plot Good
Convergence s = 0.5 • Autocorrelation plot Bad
Convergence s = 3.0 • Autocorrelation plot Okay
References • Metropolis et al. 1953, • Hastings 1973, • Tutorial paper: Chib and Greenberg (1995). Understanding the Metropolis--Hastings Algorithm. The American Statistician49, 327-335.
Gibbs Sampler • Purpose: Draw random samples form a joint distribution (high dimensional) • Method: Iterative conditional sampling
References • Geman and Geman 1984, • Gelfand and Smith 1990, • Tutorial paper: Casella and George (1992). Explaining the Gibbs sampler. The American Statistician, 46, 167-174.
Remarks • Gibbs Sampler is a special case of Metropolis-Hastings • Compare to EM algorithm, Gibbs sampler and Metropolis-Hastings are stochastic procedures • Verify convergence of the sequence • Require Burn in • Use multiple chains
References • Geman and Geman 1984, • Gelfand and Smith 1990, • Tutorial paper: Casella and George (1992). Explaining the Gibbs sampler. The American Statistician, 46, 167-174.