1 / 1

1. Motivation

Particle Filtered MCMC-MLE with Connections to Contrastive Divergence. Arthur Asuncion, Qiang Liu, Alexander Ihler, Padhraic Smyth Department of Computer Science, University of California, Irvine. 1. Motivation

todd
Download Presentation

1. Motivation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Particle Filtered MCMC-MLE with Connections to Contrastive Divergence Arthur Asuncion, Qiang Liu, Alexander Ihler, Padhraic Smyth Department of Computer Science, University of California, Irvine • 1. Motivation • Undirected models are useful in many settings. Consider models in exponential family form: • Task: Given i.i.d. data, • estimate parameters accurately and quickly • Maximum likelihood estimation (MLE): • Need to resort to approximate techniques: • Pseudolikelihood / composite likelihoods • Sampling-based techniques (e.g. MCMC-MLE) • Contrastive divergence (CD) learning • We propose particle filtered MCMC-MLE • 3. Contrastive Divergence (CD) • Widely-used machine learning algorithm for learning undirected models [Hinton, 2002] • CD can be motivated by taking gradient of log-likelihood directly: • CD-n samples from current model (approx.): • Initialize chains at empirical data distribution • Only run n MCMC steps • Persistent CD: initialize chains at samples at previous iteration [Tieleman, 2008] • 5. Experimental Analysis • Visible Boltzmann machines: • Exponential random graph models (ERGMs): • Conditional random fields (CRFs): • Restricted Boltzmann machines (RBMs): Partition function usually intractable Network statistics: # edges # 2-stars # triangles Run MCMC under θ for n steps … • 2. MCMC-MLE • Widely used in statistics [Geyer, 1991] • Idea: draw samples from alternate distribution p(x|θ0) using MCMC, to approximate the likelihood: • To optimize approximate likelihood, use gradient: • Degeneracy problems if θ moves far from initial θ0 • 4. Particle Filtered MCMC-MLE (PF) • Use sampling-importance-resampling (SIR) with MCMC rejuvenation to estimate gradient • Monitor effective sample size (ESS): • If ESS (“health” of particles) is low: • Resample particles in proportion to w • Rejuvenate with n MCMC steps based on θ • PF can avoid MCMC-MLE’s degeneracy issues • PF can be potentially faster than CD since it only “rejuvenates” when ESS is low • As the number of particles approaches infinity, PF recovers the exact log-likelihood gradient Experiments on MNIST data. 500 hidden units used. Monte Carlo approximation • 6. Conclusions • Particle filtered MCMC-MLE can avoid the degeneracy issues of MCMC-MLE by performing resampling and rejuvenation • Particle filtered MCMC-MLE is sometimes faster than CD since it only rejuvenates when needed • There is a unified view of all these algorithms MCMC-MLE uses importance sampling to estimate gradient Run MCMC under θ for n steps Update θ using approximate gradient Run MCMC under p(x|θ0) until equilibrium If ESS is low, resample and rejuvenate Calculate weight and check ESS … PF can be viewed as a “hybrid” between MCMC-MLE and CD Calculate new weight

More Related