Effective Multi-Label Active Learning for Text Classification

Effective Multi-Label Active Learning for Text Classification Bishan Yang1, Jian-Tao Sun2, Tengjiao Wang1, and Zheng Chen2 Computer Science Department, Peking University1 Microsoft Research Asia2 KDD 2009, Paris

Outline • Motivation • Related Work • SVM-Based Active Learning for Multi-Label Text Classification • Experiments • Summary

Motivation • Text classification is everywhere • Web search • News classification • Email classification • …… • Many text data are multi-labeled Business Politics Travel World news Entertainment Local news … …

Labeling Effort is Huge • Supervised learning approach • The model is trained on a set of randomly labeled data • Requires a sufficient amount of labeled data to ensure the quality of the model. The more categories, the more judging effort for each document, and more data needed to be labeled. xxxxxxxxxxxxxxxxxxxxxxxx …… C2 C4 C1 C3 C5 … … xxxxxxxxxxxxxxxxxxxxxxxx …… C2 C4 C1 C3 C5 … … xxxxxxxxxxxxxxxxxxxxxxxx …… C2 C4 C1 C3 C5 … … … … … …

Active Learning – Reduce Labeling Effort Train Classifier Select an optimal set from Selection Strategy Data Pool With an effective selection strategy, active learner can obtain comparable accuracy with supervised learner using much less labeled data. Important for multi-label text classification. Query for true labels Augment the labeled set

Challenges for Multi-Label Active Learning • How to select the most informative multi-labeled data? • Use selection strategy for single-label case? – No • E.g. x1 0.8 0.5 0.1 x2 is more informative? C3 C1 C2 What about x1 actually has two labels? 0.6 0.1 0.1 x2 C3 C1 C2

Related Work • Single-label Active learning • Uncertainty sampling [SIGIR’94, JMLR’05] • Aims to label the most uncertain data • Expected-error reduction [NIPS’95, ICML’01, ICCV’03] • Labels data to minimize the expected error • Committee-based [COLT’92, JMLR’02] • Labels data which has the largest disagreement among several committee members (classifiers) from the version space • Multi-label active learning • BinMin [Springer’06] : • Minimizes the loss on the most uncertain category for each data • MML [ICIP’04] : • Optimize the mean of the SVM hinge loss for the predicted classes • Two-dimensional active learning [ICCV’08, TPAMI’08] • Minimize the classification error on picture-label pairs

Our approach: SVM-Based Active Learning for Multi-Label Text Classification • Optimization goal • Maximize the reduction of the expected model loss • if x belongs to category , • , otherwise, .

Sample Selection Strategy with SVM • Two main issues • How to measure the loss reduction of the multi-label classifier? • How to provide a good probability estimation for the conditional probability? probability estimation loss reduction

Estimation of Loss Reduction • Decompose the multi-label problem into several binary classifiers • For each binary classifier, the model loss is measured by the size of the version space. • SVM version space [S. Tong 02] is the parameter space. The size of a version space is defined as the surface area of the hypersphere in .

Estimation of Loss Reduction (Cont.) • With version space duality, the loss reduction rate can be approximated by using the SVM output margin • Maximize the sum of loss reduction rate for all binary classifiers : loss of binary classifier built on , associated with class If f correctly predict x, then |f(x)| , uncertainty If f does not correctly predict x, Then |f(x)| , uncertainty : size of the version space for classifier : if x belongs to class i, then , otherwise

Probability Estimation • Intractable to directly compute the expected loss function • Limited training data • Large number of possible label vectors for each x • Approximate by the loss function with the largest conditional probability . : the label vector with the largest conditional probability

How to predict ? • Main ideas: • First build a classification model to predict the possible label number each data may have. • Then determine the label vector based on the prediction result.

How to predict ? (Cont.) Assign probability output for each class For each x, sort the probabilities in decreased order and normalized to make their sum equals 1. Train Logistic Regression Model For each unlabeled data, predict the probabilities of having different number of labels. Features: Label: the true label number of x If the label number with the largest probability is j, then

Experiments • Data sets: • RCV1-V2 [D. D. Lewis 04] • Reuters newswire stories • Yahoo’s webpage collection [N. Ueda 02, H. Kazawa 05] • hyperlinks from Yahoo!’s top directory

Experiment Setup • Comparing methods: • MMC (Maximum loss reduction with Maximal Confidence) • BinMin : • MML: • Random • SVMLight [T. Joachims 02] is used as the based classifier. • Performance measures • Micro-Average F1 score are the predicted labels

Results on RCV1-V2 Data set • Compare the label prediction methods • The proposed prediction method • Scut [D. D. Lewis 04] • Tune threshold for each class • SCut (threshold=0)

Results on RCV1-V2 Data set (Cont.) • Initial labeled set: 500 examples • 50 iterations, S = 20

Results on RCV1-V2 Data set (Cont.) • Vary the size of initial labeled set, 50 iterations, S = 20

Results on RCV1-V2 Data set (Cont.) • Vary the sampling size per run; initial labeled set: 500 examples • Stop after adding 1,000 labeled data

Results on Yahoo! Data set • Initial labeled set: 500 examples; • 50 iterations, S = 50

Summary • Multi-Label Active Learning for Text Classification • Important to reduce human labeling effort • Challenge task • SVM-based Multi-Label Active Learning • Optimize loss reduction rate based on SVM version space • Effective label prediction method • Successfully reduce labeling effort on real-world datasets • Future work • More efficient evaluation on the unlabeled pool • More multi-label classification tasks: e.g. image classification

Thank you!

Effective Multi-Label Active Learning for Text Classification

Effective Multi-Label Active Learning for Text Classification

Presentation Transcript

Multi-label Classification without Multi-label Cost - Multi-label Random Decision Tree Classifier

Text Classification

Mulan : A Java Library for Multi-Label Learning

Large Scale Multi-Label Classification

Multi-Label Collective Classification

Active Learning for Imbalanced Sentiment Classification

Effective Multi-Label Active Learning for Text Classification

Multi-Label Feature Selection for Graph Classification

Soft-Supervised Learning for Text Classification

Active Learning in Text Retrieval

TEXT CLASSIFICATION

Some Effective Techniques for Naive Bayes Text Classification

Text Classification

Text Classification

ACTIVE LEARNING FOR TEXT CLASSIFICATION

Active Reading for Effective Learning

Multi-Label Collective Classification

A k -Nearest Neighbor Based Algorithm for Multi-Label Classification

Text Classification

Classification Text

Text Classification

TEXT CLASSIFICATION