170 likes | 518 Views
Unsupervised Word Sense Disambiguation Rivaling Supervised Methods David Yarowsky. G22.2591 Presentation, Sonjia Waxmonsky. Introduction. Presents unsupervised learning algorithm for word sense disambiguation that can be applied to completely untagged text
E N D
Unsupervised Word Sense Disambiguation Rivaling Supervised MethodsDavid Yarowsky G22.2591 Presentation, Sonjia Waxmonsky
Introduction • Presents unsupervised learning algorithm for word sense disambiguation that can be applied to completely untagged text • Based on supervised machine learning algorithm that uses decision lists • Performance matches that of supervised system
Properties of Language One sense per collocation : Nearby words provide strong and consistent clues as to the sense of a target word One sense per discourse : The sense of a target word is highly consistent within a single document
Pr(Sense-A| Collocationi) Pr(Sense-B| Collocationi) Log( ) Decision List Algorithm • Supervised algorithm • Based on ‘One sense per collocation’ property • Start with large set of possible collocations • Calculate log-likelihood ratio of word-sense probability for each collocation: • Higher log-likelihood = more predictive evidence • Collocations are ordered in a decision list, with most predictive collocations ranked highest
Decision List Algorithm Decision list is used to classify instances of target word : • “the loss of animal and plant species • through extinction …” Classification is based on the highest ranking rule that matches the target context
Advantage of Decision Lists • Multiple collocations may match a single context • But, only the single most predictive piece of evidence is used to classify the target word • Result: The classification procedure combines a large amount of non-independent information without complex modeling
Bootstrapping Algorithm Sense-A: life Sense-B: factory • All occurrences of the target word are identified • A small training set of seed data is tagged with word sense
Selecting Training Seeds • Initial training set should accurately distinguish among possible senses • Strategies: • Select a single, defining seed collocation for each possible sense. Ex: “life” and “manufacturing” for target plant • Use words from dictionary definitions • Hand-label most frequent collocates
Bootstrapping Algorithm • Iterative procedure: • Train decision list algorithm on seed set • Classify residual data with decision list • Create new seed set by identifying samples that are tagged with a probability above a certain threshold • Retrain classifier on new seed set
Bootstrapping Algorithm Seed set grows and residual set shrinks ….
Bootstrapping Algorithm Convergence: Stop when residual set stabilizes
Final Decision List • Original seed collocations may not necessarily be at the top of the list • Possible for sample in the original seed data to be reclassified • Initial misclassifications in seed data can be corrected
One Sense per Discourse Algorithm can be improved by applying “One Sense per Discourse” constraint … • After algorithm has converged: Identify tokens tagged with low confidence, label with dominant tag of that document • After each iteration: Extend tag to all examples in a single document after enough examples are tagged with a single sense
Evaluation • Test corpus: extracted from 460 million word corpus of multiple sources (news articles, transcripts, novels, etc.) • Performance of multiple models compared with: • supervised decision lists • unsupervised learning algorithm of Schütze (1992), based on alignment of clusters with word senses
Results Applying the “One sense per discourse” constraint improves performance: Accuracy (%)
Results Accuracy exceeds Schütze algorithm for all target words, and matches that of supervised algorithm: Accuracy (%)