190 likes | 233 Views
Correlated Topic Models By Blei and Lafferty (NIPS 2005). Presented by Chunping Wang ECE, Duke University August 4 th , 2006. Outlines. Introduction Latent Dirichlet Allocation (LDA) Correlated Topic Models (CTM) Experimental Results Conclusions. Introduction(1).
E N D
Correlated Topic ModelsBy Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4th, 2006
Outlines • Introduction • Latent Dirichlet Allocation (LDA) • Correlated Topic Models (CTM) • Experimental Results • Conclusions
Introduction(1) Topic models: generative probabilistic models which use a small number of distributions over a vocabulary to describe text collections and other discrete data (such as image). Normally, some latent variables are introduced to capture abstract notions such as topics. Applications: document modeling, text classification, image processing, collaborative filtering, etc. Latent Dirichlet Allocation (LDA): allows each document to exhibit multiple topics, but ignores the correlation between topics. Correlated Topic Models (CTM): is based on LDA and dresses the limitation of LDA.
Introduction(2) Notation and terminology (text collections) • Word:the basic unit from a vocabulary of size V (includes V distinct words). The vth word is represented by • Document: a sequence of N words. • Corpus: a collection of M documents. Assumptions: • The words in a document are exchangeable; • Documents are also exchangeable.
Latent Dirichlet Allocation (LDA) (1) fixed known parameters fixed unknown parameters Random variables (w are observable) Generative process for each document W in a corpus D: • Choose • For each of the N words • Choose a topic index • Choose a word • are document-level variables, z and w are word-level variables.
Latent Dirichlet Allocation (LDA) (2) • Pros: • The Dirichlet distribution is in the exponential family and conjugate to the multinomial distribution --- variational inference is tractable. • are document-specific, so the variational parameters of could be regarded as the representation of a document --- feature set is reduced. • are sampled repeatedly within a document --- one document can be associated with multiple topics. • Cons: • Because of the independence assumption implicit in the Dirichlet distribution, LDA is unable to capture the correlation between different topics.
Correlated Topic Models (CTM) (1) Key point: the topic proportions are drawn from a logistic normal distribution rather than a Dirichlet distribution. Definition of logistic normal distribution Let denote k-dimensional real space, the (k-1)-dimensional positive simplex defined by Suppose that follows a multinormal distribution over . The logistic transformation from to can be used to define a logistic distribution over .
Logistic transformation 1 1 Log ratio transformation 1 Correlated Topic Models (CTM) (2) The density function of The logistic normal distribution is defined over the simplex as Dirichlet distribution and it allows correlation between components.
Correlated Topic Models (CTM) (3) Generative process for each document W in a corpus D: • Choose • For each of the N words • Choose a topic • (b) Choose a word
Correlated Topic Models (CTM) (4) Posterior inference (for in each document) – variational inference where Difficulty: the logistic normal is not exponential conjugate. Solution: we lower bound it with a Taylor expansion concave
Correlated Topic Models (CTM) (5) Parameters estimation (for ) – maximizing the likelihood of the entire corpus of documents Variational EM 1. (E-step) For each document, we maximize the lower bound with respect to the variational parameters ; 2. (M-step) Maximize the lower bound of the likelihood of the entire corpus with respect to the model parameters and
Experimental Results (1) Example: Modeling Science
Experimental Results (2) Comparison with LDA - Document modeling
Experimental Results (3) Comparison with LDA – Collaborative filtering To evaluate how well the models predict the remaining words after observing a portion of the document, we need to define a measure to compare . Lower numbers denote more predictive power.
Conclusions • The main contribution of this paper is that the CTM directly model correlation between topics via the logistic normal distribution. • At the same time, the nonconjugacy of the logistic normal distribution adds complexity to the variational inference process. • As the LDA, the CTM allows multiple topics for each document; its variational parameters could serve as features of the document.
Reference: J. Aitchison and S.M. Shen. Logistic-Normal Distributions: Some Properties and Uses. Biometrika, vol.67, no.2, pp.261-272, 1980. D. Blei, A. Ng and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3: 993-1022, 2003.