1 / 23

Word Sense Disambiguation

Word Sense Disambiguation. Foundations of Statistical NLP Chapter 7. 2002. 1. 18. Kyung-Hee Sung. Contents. Methodological Preliminaries Supervised Disambiguation Bayesian classification / An information-theoretic approach

Download Presentation

Word Sense Disambiguation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Word Sense Disambiguation Foundations of Statistical NLP Chapter 7 2002. 1. 18. Kyung-Hee Sung

  2. Contents • Methodological Preliminaries • Supervised Disambiguation • Bayesian classification / An information-theoretic approach • Disambiguation based on dictionaries, thesauri and bilingual dictionaries • One sense per discourse, one sense per collocation • Unsupervised Disambiguation

  3. Introduction • Word sense disambiguation • Many words have several meanings or senses. • Many words have different usages. Ex) Butter may be used as noun, or as a verb. • The task of disambiguation is done by looking at the context of the word’s use.

  4. Methodological Preliminaries (1/2)

  5. Methodological Preliminaries (2/2) • Pseudowards : artificial ambiguous words • In order to test the performance of disambiguation algorithms. Ex) banana-door • Performance estimation • Upper bound : human performance • Lower bound (baseline) : the performance of the simplest possible algorithm, usually the assignment of all contexts to the most frequent sense.

  6. Supervised Disambiguation

  7. Notations

  8. Bayesian classification (1/2) • Assumption : we have a corpus where each use of ambiguous words is labeled with its correct sense. • Bayes decision rule : minimizes the probability of errors • Decide s´ if P(s´| c) > P(sk| c) ← using Bayes’s rule ← P(c) is constant for all senses

  9. Bayesian classification (2/2) • Naive Bayes independence assumption • All the structure and linear ordering of words within the context is ignored. → bag of words model • The presence of one word in the bag is independent of another. • Decision rule for Naive Bayes • Decide s´ if ← computed by MLE

  10. An Information-theoretic approach (1/2) • The Flip-Flop algorithm applied to finding indicators for disambiguation between two senses.

  11. An Information-theoretic approach (2/2) • To translate prendre (French) based on its object • Translation, {t1, … tm} = { take, make, rise, speak } • Values of Indicator, {x1, … xm} = { mesure, note, exemple, décision, parole }

  12. Dictionary-based disambiguation (1/2) • A word’s dictionary definitions are good indicators for the senses they define.

  13. Dictionary-based disambiguation (2/2) • Simplified example : ash • The score is number of words that are shared by the sense definition and the context.

  14. Thesaurus-based disambiguation (1/2) • Semantic categories of the words in a context determine the semantic categories of the context as a whole, and that this category in turn determines which word senses are used. • Each word is assigned one or more subject codes in the dictionary. • t(sk) : subject code of sense sk. • The score is the number of words that compatible with the subject code of sense sk.

  15. Thesaurus-based disambiguation (2/2) • Self-interest (advantage) is not topic-specific. • When a sense is spread out over several topics, the topic-based classification algorithm fails.

  16. Disambiguation based on translations in a second-language corpus • In order to disambiguate an occurrence of interest in English (first language), we identify the phrase it occurs in and search a German (second language) corpus for instances of the phrase. • The English phrase showed interest: show(E) → ‘zeigen’(G) • ‘zeigen’(G) will only occur with Interesse(G) since ‘legal shares’ are usually not shown. • We can conclude that interest in the phrase to show interest belongs to the sense attention, concern.

  17. One sense per discourse constraint • The sense of a target word is highly consistent within any given document. • If the first occurrence of plant is a use of the sense ‘living being’, then later occurrences are likely to refer to living beings too.

  18. One sense per collocation constraint • Word senses are strongly correlated with certain contextual features like other words in the same phrasal unit. • Collocational features are ranked according to the ratio: • Relying on only the strongest feature has the advantage that no integration of different sources of evidence is necessary. ← The number of occurrences of sense sk with collocation fm

  19. Unsupervised Disambiguation (1/3) • There are situations in which even such a small amount of information is not available. • Sense tagging requires that some characterization of the senses be provided. However, sense discrimination can be performed in a completely unsupervised fashion. • Context-group discrimination : a completely unsupervised algorithm that clusters unlabeled occurrences.

  20. An EM algorithm (1/2) 1.Initialize the parameters of the model μ randomly. Compute the log likelihood of the corpus C 2. While l(C|μ) is improving repeat: (a) E-step. Estimatehik ← Naive Bayes assumption

  21. An EM algorithm (2/2) (b) M-step. Re-estimate the parameters P(vj|sk) and P(sk) by way of MLE

  22. Unsupervised Disambiguation (2/3) • Unsupervised disambiguation can be easily adapted to produce distinctions between usage types. • Ex) The distinction between physical bank ( in the context of bank robberies ) banks as abstract corporations ( in the context of corporate mergers ) • The unsupervised algorithm splits dictionary senses into fine-grained contextual variants. • Usually, the induced clusters do not line up well with dictionary senses. Ex) ‘lawsuit’ → ‘civil suit’, ‘criminal suit’

  23. Unsupervised Disambiguation (3/3) • Infrequent senses and senses that have few collocations are hard to isolate in unsupervised disambiguation. • Results of the EM algorithm • The algorithm fails for words whose senses are topic-independent such as ‘to teach’ for train. ← for ten experiments with different initial conditions

More Related