1 / 16

Correlated Topic Models By Blei and Lafferty (NIPS 2005)

Correlated Topic Models By Blei and Lafferty (NIPS 2005). Presented by Chunping Wang ECE, Duke University August 4 th , 2006. Outlines. Introduction Latent Dirichlet Allocation (LDA) Correlated Topic Models (CTM) Experimental Results Conclusions. Introduction(1).

finn
Download Presentation

Correlated Topic Models By Blei and Lafferty (NIPS 2005)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Correlated Topic ModelsBy Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4th, 2006

  2. Outlines • Introduction • Latent Dirichlet Allocation (LDA) • Correlated Topic Models (CTM) • Experimental Results • Conclusions

  3. Introduction(1) Topic models: generative probabilistic models which use a small number of distributions over a vocabulary to describe text collections and other discrete data (such as image). Normally, some latent variables are introduced to capture abstract notions such as topics. Applications: document modeling, text classification, image processing, collaborative filtering, etc. Latent Dirichlet Allocation (LDA): allows each document to exhibit multiple topics, but ignores the correlation between topics. Correlated Topic Models (CTM): is based on LDA and dresses the limitation of LDA.

  4. Introduction(2) Notation and terminology (text collections) • Word:the basic unit from a vocabulary of size V (includes V distinct words). The vth word is represented by • Document: a sequence of N words. • Corpus: a collection of M documents. Assumptions: • The words in a document are exchangeable; • Documents are also exchangeable.

  5. Latent Dirichlet Allocation (LDA) (1) fixed known parameters fixed unknown parameters Random variables (w are observable) Generative process for each document W in a corpus D: • Choose • For each of the N words • Choose a topic index • Choose a word • are document-level variables, z and w are word-level variables.

  6. Latent Dirichlet Allocation (LDA) (2) • Pros: • The Dirichlet distribution is in the exponential family and conjugate to the multinomial distribution --- variational inference is tractable. • are document-specific, so the variational parameters of could be regarded as the representation of a document --- feature set is reduced. • are sampled repeatedly within a document --- one document can be associated with multiple topics. • Cons: • Because of the independence assumption implicit in the Dirichlet distribution, LDA is unable to capture the correlation between different topics.

  7. Correlated Topic Models (CTM) (1) Key point: the topic proportions are drawn from a logistic normal distribution rather than a Dirichlet distribution. Definition of logistic normal distribution Let denote k-dimensional real space, the (k-1)-dimensional positive simplex defined by Suppose that follows a multinormal distribution over . The logistic transformation from to can be used to define a logistic distribution over .

  8. Logistic transformation 1 1 Log ratio transformation 1 Correlated Topic Models (CTM) (2) The density function of The logistic normal distribution is defined over the simplex as Dirichlet distribution and it allows correlation between components.

  9. Correlated Topic Models (CTM) (3) Generative process for each document W in a corpus D: • Choose • For each of the N words • Choose a topic • (b) Choose a word

  10. Correlated Topic Models (CTM) (4) Posterior inference (for in each document) – variational inference where Difficulty: the logistic normal is not exponential conjugate. Solution: we lower bound it with a Taylor expansion concave

  11. Correlated Topic Models (CTM) (5) Parameters estimation (for ) – maximizing the likelihood of the entire corpus of documents Variational EM 1. (E-step) For each document, we maximize the lower bound with respect to the variational parameters ; 2. (M-step) Maximize the lower bound of the likelihood of the entire corpus with respect to the model parameters and

  12. Experimental Results (1) Example: Modeling Science

  13. Experimental Results (2) Comparison with LDA - Document modeling

  14. Experimental Results (3) Comparison with LDA – Collaborative filtering To evaluate how well the models predict the remaining words after observing a portion of the document, we need to define a measure to compare . Lower numbers denote more predictive power.

  15. Conclusions • The main contribution of this paper is that the CTM directly model correlation between topics via the logistic normal distribution. • At the same time, the nonconjugacy of the logistic normal distribution adds complexity to the variational inference process. • As the LDA, the CTM allows multiple topics for each document; its variational parameters could serve as features of the document.

  16. Reference: J. Aitchison and S.M. Shen. Logistic-Normal Distributions: Some Properties and Uses. Biometrika, vol.67, no.2, pp.261-272, 1980. D. Blei, A. Ng and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3: 993-1022, 2003.

More Related