160 likes | 356 Views
2009 ICDMW. Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm. Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering National Taiwan Normal University. Main Reference :
E N D
2009 ICDMW Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen(陳憶文) Graduate Institute of Computer Science & Information Engineering National Taiwan Normal University Main Reference: Z. Xu, C. Hogan, R. Bauer, Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm,ICDM Workshops,pp. 326-331, 2009. ✩2012/8/21
I. INTRODUCTION • Effective active learning algorithms reduce human labeling effort, as well as produce better learning results. • However, efficient active learning algorithms for real world large scale data have not yet been well addressed either in the machine learning community or practical industrial applications. • The existing batch mode active learning algorithms, however, cannot exceed the computational bottleneck of the greedy algorithm, which takes O(KN), where K is the number of examples in the batch, and N is the total number of unlabeled examples in the collection. • We prove the selection objective function is a submodular function, which exhibits the diminishing returns property: labeling a datum when we have only a small amount of labeled data yields more learnable information for the underlying classifier, than labeling it when we already have a large amount of labeled data.
II. RELATED WORKS • Several active learning algorithms [3], [4] have been proposed to improve the support vector machine classifier. Coincidentally, these active learning approaches use the same selection scheme: Choosing the next unlabeled data close to the current decision hyperplane in the kernel space. • Brinker [5] incorporated diversity measure in the batch mode support vector machine active learning problem. This active learning algorithm employs a scoring function to select unlabeled data, which combines distance to the decision boundary [4] and diversity (that is distance to the already selected data). • Batch mode active learning considering diversity has also been applied to relevance feedback in information retrieval. • Common grounds • They explicitly or implicitly model diversity of the selected dataset. • They solve the NP hard combinatorial optimization problem with a greedy algorithm.
Submodular Objective Function • In the batch mode active learning problem, we aim to select a subset A of unlabeled examples from all the unlabeled examples N to acquire labels. We formulate the batch mode active learning problem as a constraint optimization problem: select the set of data which maximizes the reward objective function, while within the defined cost constraint. • R(A) : the reward function of a candidate unlabeled set A. • C(A) : The cost of labeling A • B : the cost constraint • The informativeness of unlabeled examples to the classifier is well captured by their uncertaintyand diversity. (1)
Submodular Objective Function • Uncertainty is a widely used selection criterion for pool based active learning algorithms. • The uncertainty could be measured by different heuristics, including uncertainty sampling in the logistic regression classifier [9], query by committee in the Naïve Bayes classifier [10], version space reduction in the support vector machine classifier[4]. • we only focus on support vector machine classifiers in this paper. • Among them, the MaxMin margin and ratio margin algorithms need to retrain the SVM classifier, which requires significant computational overhead. • So we use the simple margin algorithm, which measures the uncertainty of an unlabeled example by its distance to the current separating hyperplane.
Submodular Objective Function • The diversity of an unlabeled example is defined as the minimum distance between the unlabeled example and all the selected examples in the batch. • The distance metric we use here is cosine distance • Finally, in order to combine both requirements, viz. uncertainty and diversity, we formulate the objective function as Equation (2). • : the uncertainty of example to the current classifier • : the diversity of example • is defined as the minimum cosine distance between example to the previous selected examples . (2)
Submodular Objective Function • Intuitively, if we add a new example to a smaller set A, the diversity of the set is increased at least as much, as if we add it to a larger set B ⊇ A. • A function with this diminishing returns property is called the submodular function. • Proof the submodularity of R let and unselected example , the reward increase of Rby adding unselected example to the selected set is defined as :
Submodular Objective Function • More formally, based on the proof above, we obtain the following Theorem.
Lazy Active Learning Algorithm • The greedy algorithm selects the first example with the largest uncertainty, then calculates the diversity of the remaining examples and selects the example with the largest combination score. • However, The total complexity of the greedy algorithm is O(KN), when we • select a subset of K examples from a pool of N candidate examples.
Lazy Active Learning Algorithm • We further explore the submodularity of the objective function to reduce the number of pairwise distance calculations. We first find an example with the largest marginal reward. If the distances between this example and any of the previously selected examples have not been calculated, we update its diversity by calculating these distances. If the updated marginal reward of this example is still the largest, we select this example.
Lazy Active Learning Algorithm • Variable in Table 2 stores the number of selected examples whose cosine distances with unselected example have been calculated. By tracking , The algorithm only needs to calculate additional pairs of cosine distance to update .
IV. EXPERIMENTS • The algorithm behaves differently for different datasets. Thus, we selected 3 datasets in our experiments to cover a wide range of properties. • we consider the binary text classification task CCAT from the Reuters RCV1 collection [13]. This task has an almost balanced class ratio. • we use the task C11 category from the RCV1 collection [13], since it has an unbalanced class ratio. • we include the topic 103 in the TREC legal 2008 interactive task [1]. This task models a real world e-discovery task, which aims to find relevant information with respect to a legal subject matter. Thus, the final TREC legal dataset we are using contains 6421 labeled documents, among which 3440 documents are non-relevant, and 2981 documents are relevant. We randomly sample 3000 documents as test set, and use the remaining 3421 documents as training set. We use the three text only fields ( text body, Title, brand).
IV. EXPERIMENTS • The first part of our experiments has been set up to explore effectiveness of our active learning algorithm and the influence of the mixing parameter . • Figure 1 demonstrates that the active learning algorithm achieves better classification accuracy than the random sampling.
How Effective is the Active Learning Algorithm ? • we compare the running time of our lazy active learning algorithm with two versions of greedy algorithms: greedy algorithm using an inverted index and greedy algorithm using pairwise cosine distance calculation.
How Effective is the Active Learning Algorithm ? • We fixed the number of feedback documents at 100, and varied the total number of training documents in the pool. • For all these three datasets, we use 12.5%, 25%, 50%, and 100% of the training data as the sampling pool, and compare the speed of lazy active learning, inverted index greedy active learning, pairwise greedy active learning.
V. CONCLUSIONS To summarize, the major contributions of this paper are: • We propose a generalized object function for batch mode active learning, which is shown to be a submodular function. Based on the submodularityof the objective function, we propose an extremely fast algorithm, lazy active learning algorithm. • We extensively evaluate our new approach on several real world text classification tasks in terms of classification accuracy and computational efficiency.