310 likes | 416 Views
Multi-Criteria-based Active Learning for Named Entity Recognition. Dan Shen †‡ , Jie Zhang †‡ , Jian Su † , Guodong Zhou † , Chew-Lim Tan ‡ † Institute for Infocomm Research ‡ National University of Singapore. Outline. Introduction Active Learning in NER
E N D
Multi-Criteria-based Active Learning for Named Entity Recognition Dan Shen†‡, Jie Zhang†‡, Jian Su†, Guodong Zhou†, Chew-Lim Tan‡ †Institute for Infocomm Research ‡National University of Singapore
Outline • Introduction • Active Learning in NER • Multiple Criteria for Active Learning • Informativeness • Representativeness • Diversity • Active Learning Strategies • Experiments and Results • Conclusion ACL 2004
Motivation • Named Entity Recognition (NER) • most of current work: supervised learning • a large annotated corpus • MUC-6 / MUC-7 corpus (newswire domain) • GENIA corpus (biomedical domain) • Limitation of supervised NER • corpus annotating: tedious and time-consuming • adaptability: in limited level • Target of our work • explore active learning in NER • minimize the human annotation effort • without degrading performance ACL 2004
research focus Active Learning Framework • Given • a small labeled data set L • a large unlabeled data set U • Repeat • Train a model M on L • Use M to test U • select the most useful example from U • require human expert to label it • add the labeled example to L • Until M achieves a certain performance level ACL 2004
Related Work • Active Learning Criteria • Most of current work: informativeness • Few work: representativeness or diversity • E.g. [McCallum and Nigam 1998] and [Tang et al. 2002] • E.g. [Brinker 2003] NO workexplored multiple criteria in active learning • Active Learning in NLP • E.g. POS tagging / text classification / statistical parsing NO work explored active learning in NER ACL 2004
Outline • Introduction • Active Learning in NER • Multiple Criteria for Active Learning • Informativeness • Representativeness • Diversity • Active Learning Strategies • Experiments and Results • Conclusion ACL 2004
Active Learning in NER • SVM-based NER system • Recognize one class of NEs at a time • Features – Different from supervised learning! • Cannot be produced statistically from training data set • No gazetteer or dictionaries • Active Learning in NER system • Select a most useful batch of examples • An example – a word sequence • E.g. a named entity and its context • Measurements – from words to NEs • Only word-based score is available from SVM ACL 2004
Outline • Introduction • Active Learning in NER • Multiple Criteria for Active Learning • Informativeness • Representativeness • Diversity • Active Learning Strategies • Experiments and Results • Conclusion ACL 2004
1. Informativeness Criterion Most informative example: most uncertain in existing model Most previous works are only based on this criterion.
Informativeness Measurement • Informativeness for Word • Change / induce support vectors in SVM • Distance of the word’s feature vector to separating hyperplane • Informativeness for NE • NE -- a sequence of words • NE = w1w2…wN , wi is the ith word of NE • Heuristic scoring functions • E.g. ACL 2004
2. Representativeness Criterion Most representative example: represent most examples Only few works [McCallum and Nigam 1998; Tang et al. 2002] consider this criterion.
Similarity Measurement • Similarity between Words • Cosine-similarity Measurement • Adapted to SVM • Similarity between NEs • Alignment of two word sequences • Dynamic Time Warping (DTW) algorithm • Given • point-by-point distance • To find an optimal path • Minimize accumulated distance along the path ACL 2004
Representativeness Measurement • Representativeness of NEiin NESet • NESet = {NE1, …NEi , …NEN} • Quantified by its density • The average similarity between NEi and the other NEj(j≠i ) in NESet • Most representative NE • Largest density among all NEs in NESet • centroid of NESet ACL 2004
3. Diversity Criterion Maximize the training utility of a batch: the members in the batch have high variance to each other Only one work [Brinker 2003] considered this criterion.
Global Consideration • Consider the examples in a whole sample space • K-Means Clustering • Cluster all named entities in NESet • Suppose: • the examples in one cluster are quite similar to each other • Select the examples from different clusters at a time • Time consuming • For efficiency, filter out NEs before clustering ACL 2004
Local Consideration • Consider the examples in a batch • For an example candidate: • Compare it with all previously selected examples in the batch one by one • Add it into the batch • If the similarity between all of them is below a threshold • More efficient! ACL 2004
Outline • Introduction • Active Learning in NER • Multiple Criteria for Active Learning • Informativeness • Representativeness • Diversity • Active Learning Strategies • Experiments and Results • Conclusion ACL 2004
Select M most informative examples (Informativeness Criterion) Unlabeled Data Set Select centroid of each cluster (Representativeness Criterion) Strategy 1 Intermediate Set Batch Clustering (K clusters) (Diversity Criterion) ACL 2004
Select example with max score: λInfo+(1- λ)Rep (Informativeness & Representativeness Criteria) Unlabeled Data Set Batch example candidate Strategy 2 Compare the candidate with each example in Batch IF any of the similarity values > threshold THEN reject ELSE add to Batch (Diversity Criterion) ACL 2004
Outline • Introduction • Active Learning in NER • Multiple Criteria for Active Learning • Informativeness • Representativeness • Diversity • Active Learning Strategies • Experiments and Results • Conclusion ACL 2004
Experimental Setting 1 • Corpus • MUC-6 Corpus • To recognize Person, Location and Organization • GENIA Corpus V1.1 • To recognize Protein • Corpus Split • Initial training data set • Test data set • Unlabeled data set • Size of each data set • Batch size K • = 50 in biomedical domain • = 10 in newswire domain ACL 2004
Experimental Setting 2 • Supervised learning • trained on the entire annotated corpus. • Newswire: 408 WSJ articles • Biomedical: 590 MEDLINE abstracts • Random Selection • a batch of examples is randomly selected in each round • F-Measurement ACL 2004
Experimental Results 1 • Effectiveness of Single-Criterion-based Active Learning Supervised (223K) Info_based (52K, 62%,23%) Random (83K) ACL 2004
Avg(3.5K/11.5K, 2.1K/13.6K, 7.8K/20.2K) = 28% Avg(3.5K/157K, 2.1K/157K, 7.8K/157K) = 5% 31K/83K = 37% 31K/223K = 14% more 9K words Experimental Results 2 • Overall Results of Multi-Criteria-based Active Learning ACL 2004
Experimental Results 3 • Effectiveness of Multi-Criteria-based Active Learning Strategy2 (31K, 60%) Info_based (52K) Supervised (223K) Strategy1 (40K, 77%) ACL 2004
Outline • Introduction • Active Learning in NER • Multiple Criteria for Active Learning • Informativeness • Representativeness • Diversity • Active Learning Strategies • Experiments and Results • Conclusion ACL 2004
Conclusion • The first work-- multiple criteria in active learning • Informativeness, Representativeness & Diversity • Two active learning strategies • Outperform single-criterion-based method (60%) • The first work -- active learning in NER • Measurements for multiple criteria • Greatly reduce annotation cost • General measurements and strategies • Can be easily adapted to other NLP tasks ACL 2004
Future Work • How to automatically decide the optimal value of these parameters? • Batch size K • Linear interpolation parameter • When to stop the active learning process? • the change of support vectors ACL 2004
Acknowledgement Thanks to International Post-Graduate College (IGK) at Saarland University for the generous travel support. The first author is now a Ph.D. student in this program. ACL 2004
The End Thank You !
Corpus Split ACL 2004