1 / 23

Effective Multi-Label Active Learning for Text Classification

Effective Multi-Label Active Learning for Text Classification. Bishan Yang 1 , Jian-Tao Sun 2 , Tengjiao Wang 1 , and Zheng Chen 2. Computer Science Department, Peking University 1 Microsoft Research Asia 2. KDD 2009, Paris. Outline. Motivation Related Work

shepardj
Download Presentation

Effective Multi-Label Active Learning for Text Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Effective Multi-Label Active Learning for Text Classification Bishan Yang1, Jian-Tao Sun2, Tengjiao Wang1, and Zheng Chen2 Computer Science Department, Peking University1 Microsoft Research Asia2 KDD 2009, Paris

  2. Outline • Motivation • Related Work • SVM-Based Active Learning for Multi-Label Text Classification • Experiments • Summary

  3. Motivation • Text classification is everywhere • Web search • News classification • Email classification • …… • Many text data are multi-labeled Business Politics Travel World news Entertainment Local news … …

  4. Labeling Effort is Huge • Supervised learning approach • The model is trained on a set of randomly labeled data • Requires a sufficient amount of labeled data to ensure the quality of the model. The more categories, the more judging effort for each document, and more data needed to be labeled. xxxxxxxxxxxxxxxxxxxxxxxx …… C2 C4 C1 C3 C5 … … xxxxxxxxxxxxxxxxxxxxxxxx …… C2 C4 C1 C3 C5 … … xxxxxxxxxxxxxxxxxxxxxxxx …… C2 C4 C1 C3 C5 … … … … … …

  5. Active Learning – Reduce Labeling Effort Train Classifier Select an optimal set from Selection Strategy Data Pool With an effective selection strategy, active learner can obtain comparable accuracy with supervised learner using much less labeled data. Important for multi-label text classification. Query for true labels Augment the labeled set

  6. Challenges for Multi-Label Active Learning • How to select the most informative multi-labeled data? • Use selection strategy for single-label case? – No • E.g. x1 0.8 0.5 0.1 x2 is more informative? C3 C1 C2 What about x1 actually has two labels? 0.6 0.1 0.1 x2 C3 C1 C2

  7. Related Work • Single-label Active learning • Uncertainty sampling [SIGIR’94, JMLR’05] • Aims to label the most uncertain data • Expected-error reduction [NIPS’95, ICML’01, ICCV’03] • Labels data to minimize the expected error • Committee-based [COLT’92, JMLR’02] • Labels data which has the largest disagreement among several committee members (classifiers) from the version space • Multi-label active learning • BinMin [Springer’06] : • Minimizes the loss on the most uncertain category for each data • MML [ICIP’04] : • Optimize the mean of the SVM hinge loss for the predicted classes • Two-dimensional active learning [ICCV’08, TPAMI’08] • Minimize the classification error on picture-label pairs

  8. Our approach: SVM-Based Active Learning for Multi-Label Text Classification • Optimization goal • Maximize the reduction of the expected model loss • if x belongs to category , • , otherwise, .

  9. Sample Selection Strategy with SVM • Two main issues • How to measure the loss reduction of the multi-label classifier? • How to provide a good probability estimation for the conditional probability? probability estimation loss reduction

  10. Estimation of Loss Reduction • Decompose the multi-label problem into several binary classifiers • For each binary classifier, the model loss is measured by the size of the version space. • SVM version space [S. Tong 02] is the parameter space. The size of a version space is defined as the surface area of the hypersphere in .

  11. Estimation of Loss Reduction (Cont.) • With version space duality, the loss reduction rate can be approximated by using the SVM output margin • Maximize the sum of loss reduction rate for all binary classifiers : loss of binary classifier built on , associated with class If f correctly predict x, then |f(x)| , uncertainty If f does not correctly predict x, Then |f(x)| , uncertainty : size of the version space for classifier : if x belongs to class i, then , otherwise

  12. Probability Estimation • Intractable to directly compute the expected loss function • Limited training data • Large number of possible label vectors for each x • Approximate by the loss function with the largest conditional probability . : the label vector with the largest conditional probability

  13. How to predict ? • Main ideas: • First build a classification model to predict the possible label number each data may have. • Then determine the label vector based on the prediction result.

  14. How to predict ? (Cont.) Assign probability output for each class For each x, sort the probabilities in decreased order and normalized to make their sum equals 1. Train Logistic Regression Model For each unlabeled data, predict the probabilities of having different number of labels. Features: Label: the true label number of x If the label number with the largest probability is j, then

  15. Experiments • Data sets: • RCV1-V2 [D. D. Lewis 04] • Reuters newswire stories • Yahoo’s webpage collection [N. Ueda 02, H. Kazawa 05] • hyperlinks from Yahoo!’s top directory

  16. Experiment Setup • Comparing methods: • MMC (Maximum loss reduction with Maximal Confidence) • BinMin : • MML: • Random • SVMLight [T. Joachims 02] is used as the based classifier. • Performance measures • Micro-Average F1 score are the predicted labels

  17. Results on RCV1-V2 Data set • Compare the label prediction methods • The proposed prediction method • Scut [D. D. Lewis 04] • Tune threshold for each class • SCut (threshold=0)

  18. Results on RCV1-V2 Data set (Cont.) • Initial labeled set: 500 examples • 50 iterations, S = 20

  19. Results on RCV1-V2 Data set (Cont.) • Vary the size of initial labeled set, 50 iterations, S = 20

  20. Results on RCV1-V2 Data set (Cont.) • Vary the sampling size per run; initial labeled set: 500 examples • Stop after adding 1,000 labeled data

  21. Results on Yahoo! Data set • Initial labeled set: 500 examples; • 50 iterations, S = 50

  22. Summary • Multi-Label Active Learning for Text Classification • Important to reduce human labeling effort • Challenge task • SVM-based Multi-Label Active Learning • Optimize loss reduction rate based on SVM version space • Effective label prediction method • Successfully reduce labeling effort on real-world datasets • Future work • More efficient evaluation on the unlabeled pool • More multi-label classification tasks: e.g. image classification

  23. Thank you!

More Related