1 / 16

Unsupervised Word Sense Disambiguation Rivaling Supervised Methods David Yarowsky

Unsupervised Word Sense Disambiguation Rivaling Supervised Methods David Yarowsky. G22.2591 Presentation, Sonjia Waxmonsky. Introduction. Presents unsupervised learning algorithm for word sense disambiguation that can be applied to completely untagged text

enye
Download Presentation

Unsupervised Word Sense Disambiguation Rivaling Supervised Methods David Yarowsky

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unsupervised Word Sense Disambiguation Rivaling Supervised MethodsDavid Yarowsky G22.2591 Presentation, Sonjia Waxmonsky

  2. Introduction • Presents unsupervised learning algorithm for word sense disambiguation that can be applied to completely untagged text • Based on supervised machine learning algorithm that uses decision lists • Performance matches that of supervised system

  3. Properties of Language One sense per collocation : Nearby words provide strong and consistent clues as to the sense of a target word One sense per discourse : The sense of a target word is highly consistent within a single document

  4. Pr(Sense-A| Collocationi) Pr(Sense-B| Collocationi) Log( ) Decision List Algorithm • Supervised algorithm • Based on ‘One sense per collocation’ property • Start with large set of possible collocations • Calculate log-likelihood ratio of word-sense probability for each collocation: • Higher log-likelihood = more predictive evidence • Collocations are ordered in a decision list, with most predictive collocations ranked highest

  5. Decision List Algorithm Decision list is used to classify instances of target word : • “the loss of animal and plant species • through extinction …” Classification is based on the highest ranking rule that matches the target context

  6. Advantage of Decision Lists • Multiple collocations may match a single context • But, only the single most predictive piece of evidence is used to classify the target word • Result: The classification procedure combines a large amount of non-independent information without complex modeling

  7. Bootstrapping Algorithm Sense-A: life Sense-B: factory • All occurrences of the target word are identified • A small training set of seed data is tagged with word sense

  8. Selecting Training Seeds • Initial training set should accurately distinguish among possible senses • Strategies: • Select a single, defining seed collocation for each possible sense. Ex: “life” and “manufacturing” for target plant • Use words from dictionary definitions • Hand-label most frequent collocates

  9. Bootstrapping Algorithm • Iterative procedure: • Train decision list algorithm on seed set • Classify residual data with decision list • Create new seed set by identifying samples that are tagged with a probability above a certain threshold • Retrain classifier on new seed set

  10. Bootstrapping Algorithm Seed set grows and residual set shrinks ….

  11. Bootstrapping Algorithm Convergence: Stop when residual set stabilizes

  12. Final Decision List • Original seed collocations may not necessarily be at the top of the list • Possible for sample in the original seed data to be reclassified • Initial misclassifications in seed data can be corrected

  13. One Sense per Discourse Algorithm can be improved by applying “One Sense per Discourse” constraint … • After algorithm has converged: Identify tokens tagged with low confidence, label with dominant tag of that document • After each iteration: Extend tag to all examples in a single document after enough examples are tagged with a single sense

  14. Evaluation • Test corpus: extracted from 460 million word corpus of multiple sources (news articles, transcripts, novels, etc.) • Performance of multiple models compared with: • supervised decision lists • unsupervised learning algorithm of Schütze (1992), based on alignment of clusters with word senses

  15. Results Applying the “One sense per discourse” constraint improves performance: Accuracy (%)

  16. Results Accuracy exceeds Schütze algorithm for all target words, and matches that of supervised algorithm: Accuracy (%)

More Related