1. Motivation

Particle Filtered MCMC-MLE with Connections to Contrastive Divergence Arthur Asuncion, Qiang Liu, Alexander Ihler, Padhraic Smyth Department of Computer Science, University of California, Irvine • 1. Motivation • Undirected models are useful in many settings. Consider models in exponential family form: • Task: Given i.i.d. data, • estimate parameters accurately and quickly • Maximum likelihood estimation (MLE): • Need to resort to approximate techniques: • Pseudolikelihood / composite likelihoods • Sampling-based techniques (e.g. MCMC-MLE) • Contrastive divergence (CD) learning • We propose particle filtered MCMC-MLE • 3. Contrastive Divergence (CD) • Widely-used machine learning algorithm for learning undirected models [Hinton, 2002] • CD can be motivated by taking gradient of log-likelihood directly: • CD-n samples from current model (approx.): • Initialize chains at empirical data distribution • Only run n MCMC steps • Persistent CD: initialize chains at samples at previous iteration [Tieleman, 2008] • 5. Experimental Analysis • Visible Boltzmann machines: • Exponential random graph models (ERGMs): • Conditional random fields (CRFs): • Restricted Boltzmann machines (RBMs): Partition function usually intractable Network statistics: # edges # 2-stars # triangles Run MCMC under θ for n steps … • 2. MCMC-MLE • Widely used in statistics [Geyer, 1991] • Idea: draw samples from alternate distribution p(x|θ0) using MCMC, to approximate the likelihood: • To optimize approximate likelihood, use gradient: • Degeneracy problems if θ moves far from initial θ0 • 4. Particle Filtered MCMC-MLE (PF) • Use sampling-importance-resampling (SIR) with MCMC rejuvenation to estimate gradient • Monitor effective sample size (ESS): • If ESS (“health” of particles) is low: • Resample particles in proportion to w • Rejuvenate with n MCMC steps based on θ • PF can avoid MCMC-MLE’s degeneracy issues • PF can be potentially faster than CD since it only “rejuvenates” when ESS is low • As the number of particles approaches infinity, PF recovers the exact log-likelihood gradient Experiments on MNIST data. 500 hidden units used. Monte Carlo approximation • 6. Conclusions • Particle filtered MCMC-MLE can avoid the degeneracy issues of MCMC-MLE by performing resampling and rejuvenation • Particle filtered MCMC-MLE is sometimes faster than CD since it only rejuvenates when needed • There is a unified view of all these algorithms MCMC-MLE uses importance sampling to estimate gradient Run MCMC under θ for n steps Update θ using approximate gradient Run MCMC under p(x|θ0) until equilibrium If ESS is low, resample and rejuvenate Calculate weight and check ESS … PF can be viewed as a “hybrid” between MCMC-MLE and CD Calculate new weight

1. Motivation

1. Motivation

Presentation Transcript

1. Introduction/Motivation

1. MOTIVATION

1. Motivation

1. MOTIVATION

1. Motivation

Chapter 1 Motivation

Motivation for CDR: Deserializer (1)

Module 1 Introduction and Motivation

Motivation for CDR: Deserializer (1)

1 MOTIVATION: CLOCKS AND NAVIGATION

1. Background and Motivation

1. Background and Motivation

1. Motivation

1. Motivation

MOWAHS motivation (1)

Motivation for software improvement (1)

1. Background and Motivation

1. Motivation

1. Motivation

1. INTRODUCTION AND MOTIVATION

Motivation (1)

Motivation (1)