1 / 30

Integrated Instance- and Class-based Generative Modeling for Text Classification

Integrated Instance- and Class-based Generative Modeling for Text Classification. Antti Puurula University of Waikato Sung- Hyon Myaeng KAIST 5/12/2013 Australasian Document Computing Symposium. Instance vs. Class-based Text Classification. Class-based learning

vesna
Download Presentation

Integrated Instance- and Class-based Generative Modeling for Text Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Integrated Instance- and Class-based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-HyonMyaeng KAIST 5/12/2013 Australasian Document Computing Symposium

  2. Instance vs. Class-based Text Classification • Class-based learning • Multinomial Naive Bayes, Logistic Regression, Support Vector Machines, … • Pros: compact models, efficient inference, accurate with text data • Cons: document-level information discarded • Instance-based learning • K-Nearest Neighbors, Kernel Density Classifiers, … • Pros: document-level information preserved, efficient learning • Cons: data sparsityreduces accuracy

  3. Instance vs. Class-based Text Classification 2 • Proposal: Tied Document Mixture • integrated instance- and class-based model • retains benefits from both types of modeling • exact linear time algorithms for estimation and inference • Main ideas: • replace Multinomial class-conditional in MNB with a mixture over documents • smooth document models hierarchically with class and background models

  4. Multinomial Naive Bayes • Standard generative model for text classification • Result of simple generative assumptions • Bayes • Naive • Multinomial

  5. Multinomial Naive Bayes 2

  6. Tied Document Mixture • Replace Multinomial in MNB by a mixture over all documents • , where documents models are smoothed hierarchically • , where class models are estimated by averaging the documents

  7. Tied Document Mixture 2

  8. Tied Document Mixture 3 • Can be described as constraints on a two-level mixture • Document level mixture: • Number of components= • Components assigned to instances • Component weights= • Word level mixture: • Number of components= (hierarchy depth) • Components assigned to hierarchy • Component weights= , , and

  9. Tied Document Mixture 3 • Can be described as constraints on a two-level mixture • Document level mixture: • Number of components= • Components assigned to instances • Component weights= • Word level mixture: • Number of components= (hierarchy depth) • Components assigned to hierarchy • Component weights= , , and

  10. Tied Document Mixture 3 • Can be described as constraints on a two-level mixture • Document level mixture: • Number of components= • Components assigned to instances • Component weights= • Word level mixture: • Number of components= (hierarchy depth) • Components assigned to hierarchy • Component weights= , , and

  11. Tied Document Mixture 4 • Can be described as a class-smoothed Kernel Density Classifier • Document mixture equivalent to a Multinomial kernel density • Hierarchical smoothing corresponds to mean shift or data sharpening with class-centroids

  12. Hierarchical Sparse Inference • Reduces complexity from to • Same complexity as K-Nearest Neighbors based on inverted indices (Yang, 1994)

  13. Hierarchical Sparse Inference 2 • Precompile values: • decomposes: • Store and in inverted indices

  14. Hierarchical Sparse Inference 3 • Compute first • Update by • Update by to get • Compute • Bayes rule

  15. Hierarchical Sparse Inference 3 • Compute first • Update by • Update by to get • Compute • Bayes rule

  16. Hierarchical Sparse Inference 3 • Compute first • Update by • Update by to get • Compute • Bayes rule

  17. Hierarchical Sparse Inference 3 • Compute first • Update by • Update by to get • Compute • Bayes rule

  18. Hierarchical Sparse Inference 3 • Compute first • Update by • Update by to get • Compute • Bayes rule

  19. Hierarchical Sparse Inference 3 • Compute first • Update by • Update by to get • Compute • Bayes rule

  20. Hierarchical Sparse Inference 2 • Compute first • Update by • Update by to get • Compute • Bayes rule

  21. Experimental Setup • 14 classification datasets used: • 3 spam classification • 3 sentiment analysis • 5 multi-class classification • 3 multi-label classification • Scripts and datasets in LIBSVM format: • http://sourceforge.net/projects/sgmweka/

  22. Experimental Setup 2 • Classifiers compared: • Multinomial Naive Bayes (MNB) • Tied Document Mixture (TDM) • K-Nearest Neighbors(KNN) (Multinomial distance, distance-weighted vote) • Kernel Density Classifier (KDC) (Smoothed multinomial kernel) • Logistic Regression (LR, LR+) (L2-regularized) • Support Vector Machine (SVM, SVM+) (L2-regularized L2-loss) • LR+ and SVM+ weighted feature vectors by TFIDF • Smoothing parameters optimized for MicroFscore on held-out development sets using Gaussian Random Searches

  23. Results • Training times for MNB, TDM, KNN and KDC linear • At most 70 s for MNB on for OHSU-TREC, 170 s for the others • SVM and LR require iterative algorithms • At most 936 s, for LR on Amazon12 • Did not scale to multi-label datasets in practical times • Classification times for instance-based classifiers higher • At most mean 226 ms for TDM on OHSU-TREC, compared to 70 ms for MNB • (with 290k terms, 196k labels, 197k documents)

  24. Results 2 TDM significantly improves on MNB, KNN and KDC Across comparable datasets, TDM is on par with SVM+ • SVM+ is significantly better on multi-class datasets • TDM is significantly better on spam classification

  25. Results 2 TDM significantly improves on MNB, KNN and KDC Across comparable datasets, TDM is on par with SVM+ • SVM+ is significantly better on multi-class datasets • TDM is significantly better on spam classification

  26. Results 3 TDM reduces classification errors compared to MNB by: >65% in spam classification >26% in sentiment analysis Some correlation between error reduction and number of instances/class. Task types form clearly separate clusters

  27. Conclusion • Tied Document Mixture • Integrated instance- and class-based model for text classification • Exact linear time algorithms, with same complexities as KNN and KDC • Accuracy substantially improved over MNB, KNN and KDC • Competitive with optimized SVM, depending on task type • Many improvements to the basic model possible • Sparse inference scales to hierarchical mixtures of >340k components • Toolkit, datasets and scripts available: • http://sourceforge.net/projects/sgmweka/

  28. Sparse Inference • Sparse Inference (Puurula, 2012) • Use inverted indices to reduce complexity of computing joint for a given • Instead of computing as dot products, compute and update by for each from the inverted index • Reduces joint inference time complexity from dense to

  29. Sparse Inference 2 • Dense representation: = number of features Time complexity: = number of classes

  30. Sparse Inference 3 • Sparse representation: Time complexity:

More Related