Seungil Huh and Stephen E. Fienberg July 26, 2010 SIGKDD’10

Discriminative Topic Modeling based on Manifold Learning Seungil Huh and Stephen E. Fienberg July 26, 2010 SIGKDD’10

Manifold Learning Topic Modeling Overview Topic 1 Document 1 1.0 drug 0.081 body 0.066 pain 0.064 … … drug … pain … body … drug … 0.0 Discriminative Topic Model (DTM) 0.5 Document 2 Topic 2 pain … give … pay … body … pay 0.074 give 0.063 buy 0.061 … … 0.5 Laplacian pLSI (LapPLSI) Locally-consistent Topic Model (LTM)

Content • Background and Notations • Probabilistic Latent Semantic Analysis (pLSA) • Laplacian Eigenmaps (LE) • Previous models • Discriminative Topic Model • Experiments

1. Background 2. Previous models 3. Our model 4. Experiments Probabilistic Latent Semantic Analysis (pLSA) Formulation and graphical representation ∙ N documents:{d1, d2, …, dN} ∙ M words:{w1, w2, …, wM} ∙ K topics: {z1, z2, …, zK} d z w M N MLE estimation

1. Background 2. Previous models 3. Our model 4. Experiments Probabilistic Latent Semantic Analysis (pLSA) Simplex spanned by the words Geometric interpretation Simplex spanned by the topics KL divergence projection (Hofmann 99) Do not consider the manifold structure of data.

1. Background 2. Previous models 3. Our model 4. Experiments LaplacianEigenmaps (LE) Intuition: Local consistency Documents that have similar word distributions tend to be near one another on the manifold. (have similar topic distribution.) Formulation • W: Local similarity matrix / nearest neighbor graph • x: low-rank representation (topic distribution in topic modeling) with constraints to keep the distances between non-neighboring pairs.

1. Background 2. Previous models 3. Our model 4. Experiments Regularized Topic Model Regularization parameter Log-likelihood of pLSA Regularization term base on Laplacian Eigenmaps

1. Background 2. Previous models 3. Our model 4. Experiments Previous Models • Laplacian PLSI (Cai 08) Squared Euclidean distance • LTM (Cai 09) KL divergence Do not consider “non-neighboring relationships.”

1. Background 2. Previous models 3. Our model 4. Experiments Discriminative Topic Model (DTM) Sum of distances between non-neighboring pairs Sum of distances between neighboring pairs • Discriminative Topic Model (DTM)

1. Background 2. Previous models 3. Our model 4. Experiments Model Fitting using Generalized EM Generalized EM • E-Step: compute posterior probabilities. - same as pLSA • M-Step: find ,, that improve . • M-Step (1): re-estimate . . - same as pLSA • M-Step (2): re-estimate by finding aPareto improvementbetween and .

1. Background 2. Previous models 3. Our model 4. Experiments Pareto Improvement M-Step (2): find that improves . find aPareto improvementbetween and . (invalid) D A (valid, preferred) (valid) B (valid) C (invalid) E

1. Background 2. Previous models 3. Our model 4. Experiments Parameter Re-estimation M-Step (2): find that improves . • Update 1 • Theorem 1 • Update 2 • Theorem 2

1. Background 2. Previous models 3. Our model 4. Experiments Parameter Re-estimation M-Step (2): find that improves . Update 2 Update 1 • apply • update 2 (3) perform a line search from to (2) apply update 1 current parameters (4) Find next that achieves a Pareto improvement.

1. Background 2. Previous models 3. Our model 4. Experiments Experimental Setup • Datasets • 20 newgroups • Yahoo! News K-series • Semi-supervised classification small number (1, 3, 5, and 10) of labeled documents for each class • Comparison with other approaches • Probabilistic Latent Semantic Analysis (pLSA) • Latent Dirichlet Allocation (LDA) • Laplacian PLSI (LapPLSI) • Locally-consistent Topic Modeling (LTM) • Principal Component Analysis (PCA) • Non-negative Matrix Factorization (NMF) Topic models Traditional dimension reduction methods

1. Background 2. Previous models 3. Our model 4. Experiments Experimental Results (20 newsgroups) Higher is better

1. Background 2. Previous models 3. Our model 4. Experiments Experimental Results (Yahoo! News K-series) Higher is better

1. Background 2. Previous models 3. Our model 4. Experiments Insensitivity to Parameters

Summary • Topic model incorporating complete manifold learning formulation • Effective in semi-supervised classification • Model fitting using generalized EM and Pareto improvement • Minimize the sensitivity to parameters

Thank you!

Seungil Huh and Stephen E. Fienberg July 26, 2010 SIGKDD’10