Nonparametric Bayes Pachinko Allocation by Li, Blei and McCallum (UAI 2007)

Nonparametric Bayes Pachinko Allocationby Li, Blei and McCallum (UAI 2007) Presented by Lihan He ECE, Duke University March 3rd, 2008

Outlines • Reviews on Topic Models (LDA, CTM) • Pachinko Allocation (PAM) • Nonparametric Pachinko Allocation • Experimental Results • Conclusions

Reviews on Topic Models – Notation Notation and terminology • Word:the basic unit from a vocabulary of size V (includes V distinct words). The vth word is represented by • Document: a sequence of N words. • Corpus: a collection of M documents. • Topic: a multinomial distribution over words. Assumptions: • The words in a document are exchangeable; • Documents are also exchangeable.

N M Reviews on Topic Models - Latent Dirichlet Allocation (LDA) fixed known parameters fixed unknown parameters Random variables (w are observable) Generative process for each document W in a corpus D: • Choose • For each of the N words in the document W • Choose a topic • Choose a word is a document-level variable, z and w are word-level variables.

Reviews on Topic Models - Latent Dirichlet Allocation (LDA) Limitations: 1. Because of the independence assumption implicit in the Dirichlet distribution, LDA is unable to capture the correlation between different topics. 2. Manually select the number of topics k. is usually very large for the posterior

N M Reviews on Topic Models - Correlated Topic Models (CTM) Key point: the topic proportions are drawn from a logistic normal distribution rather than a Dirichlet distribution. Generative process for each document W in a corpus D: • Choose • For each of the N words in the document W • Choose a topic • (b) Choose a word

Reviews on Topic Models - Correlated Topic Models (CTM) Limitations: 1. Limited to pair-wise correlations between topics, and the number of parameters in the covariance matrix grows as the square of the number of topics. 2. Manually select the number of topics k.

Pachinko Allocation Model (PAM) In PAM, the concept of topics are extended to be distributions not only over words (as in LDA and CTM), but also over other topics. The structure of PAM is extremely flexible. Pachinko: a Japanese game, in which metal balls bounce down around a complex collection of pins until they land in various bins at the bottom.

root S super-topic sub-topic N word M Pachinko Allocation Model (PAM) Four-level PAM fixed known parameters fixed unknown parameters random variables mixting weights for super-topic mixting weights for sub-topic Generative process for each document W in a corpus D: • Choose • For each of the S super-topics, choose • For each of the N words in the document W • Choose a super-topic • Choose a sub-topic • Choose a word

Pachinko Allocation Model (PAM) Advantage: Capture correlations between topics by a super-topic layer. Limitation: Manually select the number of super-topics S and the number of sub-topics k.

Nonparametric Pachinko Allocation • Assumes an HDP-based prior for PAM • Based on a 5-level hierarchical Chinese restaurant process • Automatically decides the super-topic number S and the sub-topic number k Chinese restaurant process: P (a new customer sits at an occupied table t) P (a new customer sits at an unoccupied table) denoted as

restaurant category root dish super-topic customer sub-topic word Nonparametric Pachinko Allocation Notation: • There are infinite numbers of super-topic and sub-topic. • Both super-topic (category) and sub-topic (dish) are globally shared among all documents. • Sampling for super-topics involves two-level CRP. • Sampling for sub-topics involves three-level CRP.

A customer x arrives at restaurant rj • He chooses the kth entryway ejk in the restaurant from • If ejk is a new entryway, a category cl is associated to it from • After choosing the category, the customer makes the decision for which table he will sit at. He chooses table tjln from • If the customer sits at an existing table, he will share the menu and dish with other customers at the same table. Otherwise, he will choose a menu mlp for the new table from • If the customer gets an existing menu, he will eat the dish on the menu. Otherwise, he samples dish dm for the new menu from Nonparametric Pachinko Allocation Generative process:

N M Nonparametric Pachinko Allocation Graphical Model • Model parameters: scalars and base H. • Two-level clustering of indicator variables, with first level clustering using 2-layer CRP and second level clustering using 3-layer CRP. • Atoms are all drawn from base H.

Experimental Results Datasets: • 20 newsgroup comp5 dataset: 5 different newsgroups, 4,836 documents, including 468,252 words and 35,567 unique words. • Rexa dataset: digital library of computer science. Randomly choose 5,000 documents, including 350,760 words and 25,597 unique words. • NIPS dataset: 1,647 abstracts of NIPS paper from 1987-1999, including 114,142 words and 11,708 unique words. Likelihood Comparison:

Experimental Results Topic Examples 20 newsgroup comp5 dataset

Experimental Results Topic Examples NIPS dataset Nonparametric Bayes PAM discovers the sparse structure.

Conclusions A nonparametric Bayesian prior for pachinko allocation is presented based on a variant of the hierarchical Dirichlet process; Nonparametric PAM automatically discovers topic correlations as well as determining the numbers of topics at different levels; The topic structure discovered by nonparametric PAM is usually sparse.

N M Appendix: Hierarchical Latent Dirichlet Allocation (hLDA) Key difference from LDA: Topics are organized as an L-level tree structure, instead of a kxV matrix. L is prespecified manually. Generative process for each document W in a corpus D: • Choose a path from the root of the topic tree to a leaf. The path includes L topics. • Choose • For each of the N words in the document W • Choose a topic • Choose a word is a V-dim vector, which is the multinomial parameter for the znth topic along the path from root to leaf, chosen by step 1.

References: W. Li, D. M. Blei, and A. McCallum. Nonparametric Bayes pachinko allocation. In Proceedings of Conference on Uncertainty in Artificial Intelligence (UAI), 2007. W. Li and A. McCallum. Pachinko allocation: DAG-structured mixture models of topic correlations. In Proceedings of International Conference on Machine Learning (ICML), 2006. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3: 993-1022, 2003. D. M. Blei and J. D. Lafferty. Correlated topic model. In Advances in Neural Information Processing Systems (NIPS), 2006. D. M. Blei, T. L. Griffiths, M. I. Jordan, and J. B. Tenenbaum. Hierarchical topic models and the nested Chinese restaurant process. In Advances in Neural Information Processing Systems (NIPS), 2004. J. Aitchison and S. M. Shen. Logistic-normal distributions: Some properties and uses. Biometrika, vol.67, no.2, pp.261-272, 1980.

Nonparametric Bayes Pachinko Allocation by Li, Blei and McCallum (UAI 2007)

Nonparametric Bayes Pachinko Allocation by Li, Blei and McCallum (UAI 2007)

Presentation Transcript

Bayes for Beginners

Part IV: Monte Carlo and nonparametric Bayes

THE NEW SECTION SSV-UAI-GRAV

Bayes’ Rule

Kelsey McCallum

Kelsey McCallum

Chile: UAI

Jim McCallum

Nonparametric Bayes and human cognition

Using Value of Information to Learn and Classify under Hard Budgets

Venus Transit in Italy

THE NEW SECTION SSV-UAI-GRAV

Data Mining with UAI Proceedings

Neil McCallum

Value and Planning in MDPs

What if I don’t get the UAI to get into…

IPv6 Allocation Policy and Procedure