6. Experimental Analysis Visible Boltzmann machine with higher-order potentials:

Learning with Blocks: Composite Likelihood and Contrastive Divergence Arthur Asuncion1, Qiang Liu1, Alexander Ihler, Padhraic Smyth Department of Computer Science, University of California, Irvine 1 Both authors contributed equally. • 6. Experimental Analysis • Visible Boltzmann machine with higher-order potentials: • Conditional random field (CRF): • Exponential random graph model (ERGM): • We ran BCD on Lazega social network data. • 3. Contrastive Divergence • Contrastive divergence (CD) approximates the second term in the likelihood gradient using MCMC (for efficiency reasons): • CD-1 corresponds to MPLE [Hyvärinen, 2006]. • CD-∞ (i.e. chain has reached equilibrium) corresponds to MLE. • CD-n is an algorithmic variant between CD-1 and CD-∞. • We propose blocked contrastive divergence (BCD). • 1. Motivation: Efficient Parameter Estimation • Assume an exponential family: . • Suppose we have independent observations: . • Our task is to perform parameter estimation (for ). • Maximum likelihood estimation (MLE) is the standard approach: • Likelihood gradient: • MLE has nice theoretical properties: • Asymptotic consistency and normality, statistical efficiency. • Difficulty: The partition function and its gradient are generally intractable for many models. • Our approach: Composite likelihood + contrastive divergence. Expectation using samples obtained from nthstep of Gibbs sampling, initialized at empirical data distribution Expectation w.r.t. empirical data distribution Expectation w.r.t. model • 4. Blocked Contrastive Divergence • The gradient of the composite likelihood is: • where • The second term of the gradient can be approximated using a random-scan blocked Gibbs sampler (RSBG): • 1. Randomly select a data point i (from empirical data distribution). • 2. Randomly select a block c (with probability 1/C). • 3. Update by performing one blocked Gibbs step using . • Blocked contrastive divergence (BCD) is a stochastic version of MCLE (see paper for derivation). • The connection between CD and composite likelihoods allows for cross-fertilization between machine learning and statistics. Each dot is a model with random parameters. The performance as a function of the coupling strength. • 2. Pseudolikelihood and Composite Likelihood • Pseudolikelihood (i.e. MPLE) approximates the (log)likelihood by using conditional probabilities: • Properties: • Asymptotically consistent • Computationally fast • Not as statistically efficient as MLE • Underestimates dependency structure of the model • Composite likelihood (i.e. MCLE) fills gap between MLE & MPLE: • Properties: • Asymptotically consistent • Computational cost greater than MPLE and less than MLE (exponential in size of largest subset Ac) • Statistical efficiency greater than MPLE and less than MLE • Generally provides more accurate solutions than MPLE Partition function is easy to calculate Network statistics, e.g.: edge 2-star triangle • 5. Tree Structured Blocks • BCD (and CL) scales exponentially with block size. Large block sizes (>15) are too computationally expensive in practice. • We can use tree structured blocks. • Forward-backward sampling can • be performed (to obtain a blocked • sample), with time complexity • linear in block size. We focus on conditional composite likelihoods • 7. Conclusions • Blocked contrastive divergence (which combines CL and CD) is computationally efficient and accurate, especially when there are strong dependencies between blocks of variables. • Composite likelihoods allows one to trade off computation for accuracy. Tree structured blocks allow for enhanced efficiency. • Come to ICML 2010 to see our paper on CD + particle filtering! MPLE MCLE MLE Spectrum of Algorithms: “CD-∞” CD-1 BCD (our contribution) Example of tree structured blocks on 2D lattice

6. Experimental Analysis Visible Boltzmann machine with higher-order potentials:

6. Experimental Analysis Visible Boltzmann machine with higher-order potentials:

Presentation Transcript

Basic Experimental Design

Chapter 8 -- Analysis and Synthesis of Synchronous Sequential Circuits

Higher Order Thinking Skills in The Classroom

Effects of Excitatory and Inhibitory Potentials on Action Potentials

BIOPHYSICAL METHODS

Analysis of Convection Heat Transfer

Types & Significance of Labelling Machines

Gainesboro Machine Tools Corporation

Chemical Kinetics Khadijah Hanim Abdul Rahman UniMAP

The History of the Sewing Machine

Logic

Making Thinking Visible

Machine Science Distilling Free-Form Natural Laws from Experimental Data

Developing Potentials for Learning: Evidence, assessment, and progress

Reflective Journal/ Visible Thinking

BIOPHYSICAL METHODS

Linguistics 187/287 Week 6

Programming Language Concepts, COSC-3308-01 Lecture 6

Membrane Potentials and Action Potentials

Wisdom for Ministers