Integrated Instance- and Class-based Generative Modeling for Text Classification

Integrated Instance- and Class-based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-HyonMyaeng KAIST 5/12/2013 Australasian Document Computing Symposium

Instance vs. Class-based Text Classification • Class-based learning • Multinomial Naive Bayes, Logistic Regression, Support Vector Machines, … • Pros: compact models, efficient inference, accurate with text data • Cons: document-level information discarded • Instance-based learning • K-Nearest Neighbors, Kernel Density Classifiers, … • Pros: document-level information preserved, efficient learning • Cons: data sparsityreduces accuracy

Instance vs. Class-based Text Classification 2 • Proposal: Tied Document Mixture • integrated instance- and class-based model • retains benefits from both types of modeling • exact linear time algorithms for estimation and inference • Main ideas: • replace Multinomial class-conditional in MNB with a mixture over documents • smooth document models hierarchically with class and background models

Multinomial Naive Bayes • Standard generative model for text classification • Result of simple generative assumptions • Bayes • Naive • Multinomial

Multinomial Naive Bayes 2

Tied Document Mixture • Replace Multinomial in MNB by a mixture over all documents • , where documents models are smoothed hierarchically • , where class models are estimated by averaging the documents

Tied Document Mixture 2

Tied Document Mixture 3 • Can be described as constraints on a two-level mixture • Document level mixture: • Number of components= • Components assigned to instances • Component weights= • Word level mixture: • Number of components= (hierarchy depth) • Components assigned to hierarchy • Component weights= , , and

Tied Document Mixture 4 • Can be described as a class-smoothed Kernel Density Classifier • Document mixture equivalent to a Multinomial kernel density • Hierarchical smoothing corresponds to mean shift or data sharpening with class-centroids

Hierarchical Sparse Inference • Reduces complexity from to • Same complexity as K-Nearest Neighbors based on inverted indices (Yang, 1994)

Hierarchical Sparse Inference 2 • Precompile values: • decomposes: • Store and in inverted indices

Hierarchical Sparse Inference 3 • Compute first • Update by • Update by to get • Compute • Bayes rule

Hierarchical Sparse Inference 2 • Compute first • Update by • Update by to get • Compute • Bayes rule

Experimental Setup • 14 classification datasets used: • 3 spam classification • 3 sentiment analysis • 5 multi-class classification • 3 multi-label classification • Scripts and datasets in LIBSVM format: • http://sourceforge.net/projects/sgmweka/

Experimental Setup 2 • Classifiers compared: • Multinomial Naive Bayes (MNB) • Tied Document Mixture (TDM) • K-Nearest Neighbors(KNN) (Multinomial distance, distance-weighted vote) • Kernel Density Classifier (KDC) (Smoothed multinomial kernel) • Logistic Regression (LR, LR+) (L2-regularized) • Support Vector Machine (SVM, SVM+) (L2-regularized L2-loss) • LR+ and SVM+ weighted feature vectors by TFIDF • Smoothing parameters optimized for MicroFscore on held-out development sets using Gaussian Random Searches

Results • Training times for MNB, TDM, KNN and KDC linear • At most 70 s for MNB on for OHSU-TREC, 170 s for the others • SVM and LR require iterative algorithms • At most 936 s, for LR on Amazon12 • Did not scale to multi-label datasets in practical times • Classification times for instance-based classifiers higher • At most mean 226 ms for TDM on OHSU-TREC, compared to 70 ms for MNB • (with 290k terms, 196k labels, 197k documents)

Results 2 TDM significantly improves on MNB, KNN and KDC Across comparable datasets, TDM is on par with SVM+ • SVM+ is significantly better on multi-class datasets • TDM is significantly better on spam classification

Results 3 TDM reduces classification errors compared to MNB by: >65% in spam classification >26% in sentiment analysis Some correlation between error reduction and number of instances/class. Task types form clearly separate clusters

Conclusion • Tied Document Mixture • Integrated instance- and class-based model for text classification • Exact linear time algorithms, with same complexities as KNN and KDC • Accuracy substantially improved over MNB, KNN and KDC • Competitive with optimized SVM, depending on task type • Many improvements to the basic model possible • Sparse inference scales to hierarchical mixtures of >340k components • Toolkit, datasets and scripts available: • http://sourceforge.net/projects/sgmweka/

Sparse Inference • Sparse Inference (Puurula, 2012) • Use inverted indices to reduce complexity of computing joint for a given • Instead of computing as dot products, compute and update by for each from the inverted index • Reduces joint inference time complexity from dense to

Sparse Inference 2 • Dense representation: = number of features Time complexity: = number of classes

Sparse Inference 3 • Sparse representation: Time complexity:

Integrated Instance- and Class-based Generative Modeling for Text Classification

Integrated Instance- and Class-based Generative Modeling for Text Classification

Presentation Transcript

Instance Based Learning

Generative Models For Text

Generative and Discriminative Models in Text Classification

Instance-Based Learners

On Compression-Based Text Classification

Instance Based Approach

Classification-based Glioma Diffusion Modeling

Bayesian Generative Modeling

Instance Based Learning

Instance-based Classification

Classification using instance-based learning

Generative Modeling and Classification of Dialogs by Low-Level Features

Instance based learning

Instance-Based Learning

Instance-Based Learning

Instance-based Classification

TEXT CLASSIFICATION -----SVM-based Approach

Bayesian Generative Modeling

Instance Based Learning

Generative (Bayesian) modeling

Instance Based Learning