Fast and Accurate Inference for Topic Models

Fast and Accurate Inference for Topic Models James Foulds University of California, Santa Cruz Presented at eBay Research Labs

Motivation • There is an ever-increasing wealth of digital information available • Wikipedia • News articles • Scientific articles • Literature • Debates • Blogs, social media … • We would like automatic methods to help us understand this content

Motivation • Personalized recommender systems • Social network analysis • Exploratory tools for scientists • The digital humanities • …

The Digital Humanities

Dimensionality reduction The quick brown fox jumps over the sly lazy dog

Dimensionality reduction The quick brown fox jumps over the sly lazy dog [5 6 37 1 4 30 9 22 570 12]

Dimensionality reduction The quick brown fox jumps over the sly lazy dog [5 6 37 1 4 30 9 22 570 12] FoxesDogsJumping [40% 40% 20% ]

Latent Variable Models Z Latent variables X Φ Parameters Observed data Data Points Dimensionality(X) >> dimensionality(Z) Z is a bottleneck, which finds a compressed, low-dimensional representationof X

Latent Feature Models forSocial Networks Alice Bob Claire

Latent Feature Models forSocial Networks Alice Bob Tango Salsa Cycling Fishing Running Claire Waltz Running

Miller, Griffiths, Jordan (2009)Latent Feature Relational Model Alice Bob Tango Salsa Cycling Fishing Running Claire Waltz Running Z =

Latent Representations • Binary latent feature • Latent class • Mixed membership

Latent Variable ModelsAs Matrix Factorization

Miller, Griffiths, Jordan (2009)Latent Feature Relational Model Alice Bob Tango Salsa Cycling Fishing Running Claire Waltz Running Z =

Miller, Griffiths, Jordan (2009)Latent Feature Relational Model Alice Bob Tango Salsa Cycling Fishing Running E[Y]=(ZWZT) Claire Waltz Running Z =

Topics Topic 1 Reinforcement learning Topic 2 Learning algorithms Topic 3 Character recognition Distribution over all words in dictionary A vector of discrete probabilities (sums to one)

Topics Topic 1 Reinforcement learning Topic 2 Learning algorithms Topic 3 Character recognition Top 10 words

Latent Dirichlet Allocation(Blei et al., 2003) • For each document d • Draw its topic proportionθ(d) ~ Dirichlet(α) • For each wordwd,n • Draw a topic assignmentzd,n ~ Discrete(θ(d)) • Draw a word from the chosen topic wd,n ~ Discrete(φZd,n) φ

Latent Dirichlet Allocation(Blei et al., 2003) • For each topic k • Draw its distribution over wordsφ(k) ~ Dirichlet(β) • For each wordwd,n • Draw a topic assignmentzd,n ~ Discrete(θ(d)) • Draw a word from the chosen topic wd,n ~ Discrete(φZd,n) φ

Latent Dirichlet Allocation(Blei et al., 2003) • For each document d • Draw its topic proportionθ(d) ~ Dirichlet(α) • For each wordwd,n • Draw a topic assignmentzd,n ~ Discrete(θ(d)) • Draw a word from the chosen topic wd,n ~ Discrete(φZd,n) φ

LDA as Matrix Factorization x θ φT

Let’s say we want to build an LDAtopic model on Wikipedia

LDA on Wikipedia 10 mins 1 hour 6 hours 12 hours

LDA on Wikipedia 1 full iteration = 3.5 days! 10 mins 1 hour 6 hours 12 hours

LDA on Wikipedia Stochastic variational inference Stochastic variational inference 10 mins 1 hour 6 hours 12 hours

LDA on Wikipedia Stochastic collapsedvariational inference 10 mins 1 hour 6 hours 12 hours

Available tools

Collapsed Inference for LDAGriffiths and Steyvers (2004) • Marginalize out the parameters, and perform inference on the latent variables only Z Z

Collapsed Inference for LDAGriffiths and Steyvers (2004) • Marginalize out the parameters, and perform inference on the latent variables only • Simpler, faster and fewer update equations • Better mixing for Gibbs sampling

Collapsed Inference for LDAGriffiths and Steyvers (2004) • Collapsed Gibbs sampler

Collapsed Inference for LDAGriffiths and Steyvers (2004) • Collapsed Gibbs sampler Word-topic counts

Collapsed Inference for LDAGriffiths and Steyvers (2004) • Collapsed Gibbs sampler Document-topic counts

Collapsed Inference for LDAGriffiths and Steyvers (2004) • Collapsed Gibbs sampler Topic counts

Stochastic Optimization for ML Stochastic algorithms • While (not converged) • Process a subset of the dataset, to estimate the update • Update parameters

Stochastic Optimization for ML • Stochastic gradient descent • Estimate the gradient • Stochastic variational inference (Hoffman et al. 2010, 2013) • Estimate the natural gradient of the variational parameters • Online EM (Cappe and Moulines, 2009) • EstimateE-step sufficient statistics

Goal: Build a Fast, Accurate,Scalable Algorithm for LDA • Collapsed LDA • Easy to implement • Fast • Accurate • Mixes well / propagates information quickly • Stochastic algorithms • Scalable • Quickly forgets random initialization • Memory requirements, update time independent of size of data set • Can estimate topics before a single pass of the data is complete • Our contribution: an algorithm which gets the best of both worlds

Variational Bayesian Inference • An optimization strategy for performing posterior inference, i.e. estimating Pr(Z|X) Q P

Variational Bayesian Inference • An optimization strategy for performing posterior inference, i.e. estimating Pr(Z|X) Q KL(Q || P) P

Fast and Accurate Inference for Topic Models

Fast and Accurate Inference for Topic Models

Presentation Transcript

Fast, Accurate and Reliable Robotic Storage System

Probability and inference Randomness; Probability models

Knowledge Representation and Inference Models for Textual Entailment

Topic Fast food

Topic models

Formulation and Calibration of Fast, Accurate Vehicle Motion Models

Probabilistic Lexical Models for Textual Inference

HAsim FPGA-Based Processor Models: Fast, Accurate and Flexible

Learning Approximate Inference Policies for Fast Prediction

Graphical Models - Inference -

Topic Models

Topic 25: Inference for Two-Way ANOVA

Delivering fast and accurate name and address data

A Fast, Accurate Deterministic Parser for Chinese

Memory Bounded Inference on Topic Models

Exploring Network Inference Models

Fast. Accurate. Affordable

Fast and Accurate Serbo-croatian Translation Services

Fast and Accurate Shopify Product Listing Services

Causal Inference and Graphical Models

Constrained Conditional Models for Global Learning and Inference

Topic models for corpora and for graphs