1 / 20

Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, 2006.

Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National University http://bi.snu.ac.kr/. Contents. 11.0 Introduction 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo

ellajones
Download Presentation

Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, 2006.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ch 11. Sampling ModelsPattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National University http://bi.snu.ac.kr/

  2. Contents • 11.0 Introduction • 11.1 Basic Sampling Algorithms • 11.2 Markov Chain Monte Carlo • 11.3 Gibbs Sampling • 11.4 Slice Sampling • 11.5 The Hybrid Monte Carlo Algorithm • 11.6 Estimating the Partition Function (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  3. 0. Introduction • The problem: finding the expectation of some function f(z) w.r.t. a prob. dist. p(z). • Can be approximated by sampling independent points from the distribution p and summation. (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  4. 1. Basic Sampling Algorithms • Transformation method • Use a uniform generator and transform the output Can we get h-1 always? (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  5. Rejection sampling • Assumptions • Sampling directly from target distribution p(z) is difficult. • Estimating p(z) is easy for any value of z. How to choose q? (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  6. Adaptive rejection sampling • Construct qon the fly based on the measured values of p. • If a sample is rejected, it is added to the set of grid points and the q get refined. • Exponential decrease of acceptance rate with dimensionality (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  7. Importance sampling • Directly approximate the expectation without sampling. • Motivation • The expectation can be approximated by finite summation. • But, the number of summation increases exponentially with dimensionality. • Not all regions of z space have significant p value. (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  8. Sample from approximate dist. q weighted by p Depends on the choice of q Can produce error with no diagnostic indication (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  9. Sampling-importance-resampling • It is difficult to set k in rejection sampling • Sampling from q. • Set weight on each sample as in importance sampling. • Resample from the samples. • Final samples approximate p as the sample size increases. Can get momentum at this step. Depends on the choice of q (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  10. 1. 2. • Sampling and the EM algorithm • Sampling can be used to approximate the E step in EM algorithm: Monte Carlo EM algorithm. • IP algorithm • (I-Step, Imputation step, ~ E-Step) Sample from the joint posterior. • (P-Step, Posterior step, ~ M-Step) Compute a revised estimate of the posterior using samples from I-Step. (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  11. 2. Markov Chain Monte Carlo • Allows sampling from a large class of distribution. • Scales well with the dimensionality of the sample space. • Basic Metropolis Algorithm • Maintain a record of state z(t) • Next state is sampled from q(z|z(t)) (q must be symmetric). • Candidate state from q is accepted with prob. • If rejected, current state is added to the record and becomes the next state. • Dist. of z tends to p in the infinity. • The original sequence is autocorrelated and get every Mth sample to get independent samples. For large M, the retained samples will be independent. (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  12. Random walk behavior • After t steps, the average distance covered by a random walk is proportional to the square root of t. • Very inefficient in exploring the state space. • To avoid this behavior is essential to MCMC. (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  13. Markov chains • Homogeneous • Invariant distribution • Detailed balance • Ergodicity • Equilibrium distribution (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  14. Metropolis-Hastings algorithm • Generalization of Metropolis algorithm • q can be non-symmetric. • Accept prob. • P defined by Metropolis-Hastings algorithm is a invariant distribution. • The common choice for q is Gaussian • Step size vs. convergence time (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  15. 3. Gibbs Sampling • Simple and widely applicable • Special case of Metropolis-Hastings algorithm. • Each step replaces the value of one of the variables by a value drawn from the dist. of that variable conditioned on the values of the remaining variables. • The procedure • Initialize zi • For t=1,…,T • Sample (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  16. p is an invariant of each of Gibbs sampling steps and whole Markov chain. • At each step, the marginal dist. p(z\i) is invariant. • Each step correctly samples from the cond. dist. p(zi|z\i) • The Markov chain defined is ergodic. • The cond. dist. must be non-zero. • The Gibbs sampling correctly samples from p. • Gibbs sampling as an instance of Metropolis-Hastings algorithm. • A step involving zk in which z\k remain fixed. • Transition prob. qk(z*|z) = p(z*k|z\k) (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  17. Random walk behavior • The number of steps needed to obtain independent samples is of order (L/l)2. • Over-relaxation • The practical applicability depends on the ease of sampling from the conditional distributions. (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  18. 4. Slice Sampling • Adaptive step size automatically adjusted to match the characteristics of the distribution. (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  19. 5. The Hybrid Monte Carlo Algorithm • Hamiltonian dynamics • Joint distribution over phase space (z, r) with total energy as Hamiltonian H. • H is invariant : replace r by drawing from its conditional probability on z. • Hamiltonian dynamics + Metropolis algorithm • Updates the momentum by Markov chain. • Hamiltonian dynamical update by leapfrog algorithm. • Accept new state by min(1, exp{H(z, r) – H(z*, r*)}) (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  20. 6. Estimating the Partition Function • Knowing the normalization constant in density function. • Estimating ratio of partition functions • Model comparison or model averaging • Importance sampling with energy function G. • Finding absolute value of the partition function for complex distribution: chaining. (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

More Related