1 / 11

WSD using Optimized Combination of Knowledge Sources

WSD using Optimized Combination of Knowledge Sources. Authors: Yorick Wilks and Mark Stevenson Presenter: Marian Olteanu. Introduction. Regular approaches All words Sample (small trial section) Problems Ambiguity, especially at fine granularity

Download Presentation

WSD using Optimized Combination of Knowledge Sources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WSD using Optimized Combination of Knowledge Sources Authors: Yorick Wilks and Mark Stevenson Presenter: Marian Olteanu

  2. Introduction • Regular approaches • All words • Sample (small trial section) • Problems • Ambiguity, especially at fine granularity • New senses in text that are not in dictionary

  3. Approach • Integrates partial sources of information • Part-of-speech • Dictionary definitions • Pragmatic codes • Selectional restrictions • Integration • Filters • Partial selectors (taggers)

  4. Dictionary for senses • Longman Dictionary of Contemporary English (LDOCE) • Two levels: • Homograph • Sense

  5. Methodology • Preprocessing • Part-of-speech tagger (Brill) • Part-of-speech • Filter – eliminate all incompatible homographs • If no sense remains – keep all senses

  6. Methodology (cont.) • Dictionary definitions • Partial tagger: • Count number of words that appear both in definition and the context • Normalize by the length of the definition • Return a list of candidate senses

  7. Methodology (cont.) • Pragmatic codes • Partial tagger - Uses the hierarchy of LDOCE pragmatic codes (subject area) • Modified simulated annealing • Optimize the number of pragmatic codes of the same type in the sentence • Whole paragraph - Only for nouns ?

  8. Methodology (cont.) • Selectional Restrictions • Filter • LDOCE senses – 35 semantic classes (H = human, M = human male, P = plant, etc) • Nouns – their type, adjs – the type of the object they modify, adv – type of their modifier, verbs – types of S, DO, IO

  9. Methodology (cont.) • Combine knowledge sources • Decision lists • Can assign sense to unknown words, if there is a definition in LDOCE

  10. Evaluation • Create a corpus based on SemCor (200,000 words; tagged with WordNet senses) • SENSUS – merging between LDOCE and WordNet (for Machine Translation) • Still ambiguity • 36,869 out of 85,747 words (personal opinion: strongly biased)

  11. Baseline: 49.8% 70% of the 1st sense – correctly tagged 83.4% accuracy = 92.8% accuracy on all words (!!!) Test by voting: Results

More Related