Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, 2006.

Ch 11. Sampling ModelsPattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National University http://bi.snu.ac.kr/

Contents • 11.0 Introduction • 11.1 Basic Sampling Algorithms • 11.2 Markov Chain Monte Carlo • 11.3 Gibbs Sampling • 11.4 Slice Sampling • 11.5 The Hybrid Monte Carlo Algorithm • 11.6 Estimating the Partition Function (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

0. Introduction • The problem: finding the expectation of some function f(z) w.r.t. a prob. dist. p(z). • Can be approximated by sampling independent points from the distribution p and summation. (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Rejection sampling • Assumptions • Sampling directly from target distribution p(z) is difficult. • Estimating p(z) is easy for any value of z. How to choose q? (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Adaptive rejection sampling • Construct qon the fly based on the measured values of p. • If a sample is rejected, it is added to the set of grid points and the q get refined. • Exponential decrease of acceptance rate with dimensionality (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Importance sampling • Directly approximate the expectation without sampling. • Motivation • The expectation can be approximated by finite summation. • But, the number of summation increases exponentially with dimensionality. • Not all regions of z space have significant p value. (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Sampling-importance-resampling • It is difficult to set k in rejection sampling • Sampling from q. • Set weight on each sample as in importance sampling. • Resample from the samples. • Final samples approximate p as the sample size increases. Can get momentum at this step. Depends on the choice of q (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

1. 2. • Sampling and the EM algorithm • Sampling can be used to approximate the E step in EM algorithm: Monte Carlo EM algorithm. • IP algorithm • (I-Step, Imputation step, ~ E-Step) Sample from the joint posterior. • (P-Step, Posterior step, ~ M-Step) Compute a revised estimate of the posterior using samples from I-Step. (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

2. Markov Chain Monte Carlo • Allows sampling from a large class of distribution. • Scales well with the dimensionality of the sample space. • Basic Metropolis Algorithm • Maintain a record of state z(t) • Next state is sampled from q(z|z(t)) (q must be symmetric). • Candidate state from q is accepted with prob. • If rejected, current state is added to the record and becomes the next state. • Dist. of z tends to p in the infinity. • The original sequence is autocorrelated and get every Mth sample to get independent samples. For large M, the retained samples will be independent. (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Random walk behavior • After t steps, the average distance covered by a random walk is proportional to the square root of t. • Very inefficient in exploring the state space. • To avoid this behavior is essential to MCMC. (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Metropolis-Hastings algorithm • Generalization of Metropolis algorithm • q can be non-symmetric. • Accept prob. • P defined by Metropolis-Hastings algorithm is a invariant distribution. • The common choice for q is Gaussian • Step size vs. convergence time (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

3. Gibbs Sampling • Simple and widely applicable • Special case of Metropolis-Hastings algorithm. • Each step replaces the value of one of the variables by a value drawn from the dist. of that variable conditioned on the values of the remaining variables. • The procedure • Initialize zi • For t=1,…,T • Sample (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

p is an invariant of each of Gibbs sampling steps and whole Markov chain. • At each step, the marginal dist. p(z\i) is invariant. • Each step correctly samples from the cond. dist. p(zi|z\i) • The Markov chain defined is ergodic. • The cond. dist. must be non-zero. • The Gibbs sampling correctly samples from p. • Gibbs sampling as an instance of Metropolis-Hastings algorithm. • A step involving zk in which z\k remain fixed. • Transition prob. qk(z*|z) = p(z*k|z\k) (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Random walk behavior • The number of steps needed to obtain independent samples is of order (L/l)2. • Over-relaxation • The practical applicability depends on the ease of sampling from the conditional distributions. (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

5. The Hybrid Monte Carlo Algorithm • Hamiltonian dynamics • Joint distribution over phase space (z, r) with total energy as Hamiltonian H. • H is invariant : replace r by drawing from its conditional probability on z. • Hamiltonian dynamics + Metropolis algorithm • Updates the momentum by Markov chain. • Hamiltonian dynamical update by leapfrog algorithm. • Accept new state by min(1, exp{H(z, r) – H(z*, r*)}) (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

6. Estimating the Partition Function • Knowing the normalization constant in density function. • Estimating ratio of partition functions • Model comparison or model averaging • Importance sampling with energy function G. • Finding absolute value of the partition function for complex distribution: chaining. (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, 2006.

Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, 2006.

Presentation Transcript

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning

PATTERN RECOGNITION AND MACHINE LEARNING

Ch 2. Probability Distribution Pattern Recognition and Machine Learning, C. M. Bishop, 2006.

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, 2006.

Ch 5. Neural Networks (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, 2006.

Pattern Recognition and Machine Learning

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, 2006.

Ch 6. Kernel Methods Pattern Recognition and Machine Learning, C. M. Bishop, 2006.

Ch 13. Sequential Data (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, 2006.

Ch 12. continuous latent variables Pattern Recognition and Machine Learning, C. M. Bishop, 2006.

Ch 14. Combining Models Pattern Recognition and Machine Learning, C. M. Bishop, 2006.