1 / 36

CPSC 503 Computational Linguistics

CPSC 503 Computational Linguistics. Computational Lexical Semantics Lecture 12 Giuseppe Carenini. Today 22/10. Three well-defined Semantic Task Word Sense Disambiguation Corpus and Thesaurus Word Similarity Thesaurus and Corpus Semantic Role Labeling. WSD example: table + ?? -> [1-6].

axel-meyers
Download Presentation

CPSC 503 Computational Linguistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CPSC 503Computational Linguistics Computational Lexical Semantics Lecture 12 Giuseppe Carenini CPSC503 Winter 2007

  2. Today 22/10 Three well-defined Semantic Task • Word Sense Disambiguation • Corpus and Thesaurus • Word Similarity • Thesaurus and Corpus • Semantic Role Labeling CPSC503 Winter 2007

  3. WSD example: table + ?? -> [1-6] The noun "table" has 6 senses in WordNet.1. table, tabular array -- (a set of data …)2. table -- (a piece of furniture …)3. table -- (a piece of furniture with tableware…)4. mesa, table -- (flat tableland …)5. table -- (a company of people …)6. board, table -- (food or meals …) CPSC503 Winter 2007

  4. WSD methods • Machine Learning • Supervised • Unsupervised • Dictionary / Thesaurus (Lesk) CPSC503 Winter 2007

  5. Training Data Machine Learning ((word + context1)  sense1) …… ((word + contextn)  sensen) Classifier (word + context) sense Supervised ML Approaches to WSD CPSC503 Winter 2007

  6. context sense Examples, • One of 8 possible senses for “bass” in WordNet • One of the 2 key distinct senses for “bass” in WordNet Training Data Example ((word + context)  sense)i ..after the soup she had bass with a big salad… CPSC503 Winter 2007

  7. WordNet Bass: music vs. fish The noun ``bass'' has 8 senses in WordNet • bass - (the lowest part of the musical range) • bass, bass part - (the lowest part in polyphonic music) • bass, basso - (an adult male singer with …) • sea bass, bass - (flesh of lean-fleshed saltwater fish of the family Serranidae) • freshwater bass, bass - (any of various North American lean-fleshed ………) • bass, bass voice, basso - (the lowest adult male singing voice) • bass - (the member with the lowest range of a family of musical instruments) • bass -(nontechnical name for any of numerous edible marine and freshwater spiny-finned fishes) CPSC503 Winter 2007

  8. Supervised ML requires a very simple representation for the training data: vectors of feature/value pairs Representations for Context • GOAL: Informative characterization of the window of text surrounding the target word • TASK: Select relevant linguistic information, encode them as a feature vector CPSC503 Winter 2007

  9. Relevant Linguistic Information(1) • Collocational: info about the words that appear in specific positions to the right and left of the target word Typically words and their POS [word in position -n, part-of-speech position -n, … word in position +n, part-of-speech position +n,] Assume a window of +/- 2 from the target • Example text (WSJ) • An electric guitar and bass player stand off to one side not really part of the scene, … [guitar, NN, and, CJC, player, NN, stand, VVB] CPSC503 Winter 2007

  10. Relevant Linguistic Information(2) Co-occurrence: info about the words that occur anywhere in the window regardless of position • Find k content words that most frequently co-occur with target in corpus (for bass: fishing, big, sound, player, fly …, guitar, band)) Vector for one case: [c(fishing), c(big), c(sound), c(player), c(fly), …, c(guitar), c(band)] • Example text (WSJ) • An electric guitar and bass player stand off to one side not really part of the scene, … [0,0,0,1,0,0,0,0,0,0,1,0] CPSC503 Winter 2007

  11. Training Data Examples Let’s assume: bass-music encoded as 0 bass-fish encoded as 1 [0,0,0,1,0,0,0,0,0,0,1,0,0] [guitar, NN, and, CJC, player, NN, stand, VVB, 0] • Inputs to classifiers [a, AT0, sea, CJC, to, PRP, me, PNP, 1] [1,0,0,0,0,0,0,0,0,0,0,0,1] [play, VVB, the, AT0, with, PRP, others, PNP, 0] [……… ] [1,0,0,0,0,0,0,0,0,0,0,1,1] […………………..] [1,1,0,0,0,1,0,0,0,0,0,0] [guitar, NN, and, CJC, could, VM0, be, VVI] CPSC503 Winter 2007

  12. ML for Classifiers • Training Data: • Co-occurrence • Collocational • Naïve Bayes • Decision lists • Decision trees • Neural nets • Support vector machines • Nearest neighbor methods… Machine Learning Classifier CPSC503 Winter 2007

  13. Independence Naïve Bayes CPSC503 Winter 2007

  14. Naïve Bayes: Evaluation Experiment comparing different classifiers [Mooney 96] • Naïve Bayes and Neural Network achieved highest performance • 73% in assigning one of six senses to line • Is this good? • Simplest Baseline: “most frequent sense” • Celing: human inter-annotator agreement • 75%-80% on refined sense distinctions (wordnet) • Closer to 90% for binary distinctions CPSC503 Winter 2007

  15. seeds Small Training Data Machine Learning More Classified Data Classifier More Data Bootstrapping • What if you don’t have enough data to train a system… CPSC503 Winter 2007

  16. Bootstrapping: how to pick the seeds • Hand-labeling (Hearst 1991): • Likely correct • Likely to be prototypical • One sense per collocation (Yarowsky 1995): E.g., bass: play is strongly associated with the music sense whereas fish is strongly associated the fish sense • One Sense Per Discourse: multiple occurrences of word in one discourse tend to have the same sense CPSC503 Winter 2007

  17. Hand-labeling (c1 sense1) …… Vector/cluster Similarity sense (word + vector) Unsupervised Methods [Schutze ’98] Machine Learning (Clustering) Training Data (word + vector)1 …… (word + vector)n K Clusters ci CPSC503 Winter 2007

  18. Agglomerative Clustering • Assign each instance to its own cluster • Repeat • Merge the two clusters that are more similar • Until (specified # of clusters is reached) • If there are too many training instances ->random sampling CPSC503 Winter 2007

  19. Problems • Given these general ML approaches, how many classifiers do I need to perform WSD robustly • One for each ambiguous word in the language • How do you decide what set of tags/labels/senses to use for a given word? • Depends on the application CPSC503 Winter 2007

  20. WDS: Dictionary and Thesaurus Methods Most common: Lesk method • Choose the sense whose dictionary gloss shares most words with the target word’s neighborhood • Exclude stop-words Def: Words in gloss for a sense is called the signature CPSC503 Winter 2007

  21. Lesk: Example Two SENSES for channel S1: (n) channel (a passage for water (or other fluids) to flow through) "the fields were crossed with irrigation channels"; "gutters carried off the rainwater into a series of channels under the street" S2: (n) channel, television channel, TV channel (a television station and its programs) "a satellite TV channel"; "surfing through the channels"; "they offer more than one hundred channels" ….. “ most streets closed to the TV station were flooded because the main channel was clogged by heavy rain .” CPSC503 Winter 2007

  22. Corpus Lesk Best performer • If a corpus with annotated senses is available • For each sense: add all the words in the sentences containing that sense to the signature for that sense CORPUS …… “most streets closed to the TV station were flooded because the main <S1> channel </S1> was clogged by heavy rain. ….. ? CPSC503 Winter 2007

  23. WSD: More Recent Trends • Better ML techniques (e.g., Combining Classifiers) • Combining ML and Lesk • Other Languages • Building better/larger corpora CPSC503 Winter 2007

  24. Today 22/10 • Word Sense Disambiguation • Word Similarity • Semantic Role Labeling CPSC503 Winter 2007

  25. Word Similarity Actually relation between two senses Similarity vs. Relatedness sun vs. moon – mouth vs. food – hot vs. cold Applications? • Thesaurus methods: measure distance in online thesauri (e.g., Wordnet) • Distributional methods: finding if the two words appear in similar contexts CPSC503 Winter 2007

  26. WS: Thesaurus Methods(1) • Path-length based sim on hyper/hypo hierarchies • Information content word similarity (not all edges are equal) probability Information Lowest Common Subsumer CPSC503 Winter 2007

  27. WS: Thesaurus Methods(2) • One of best performers – Jiang-Conrath distance • This is a measure of distance. Reciprocal for similarity! • See also Extended Lesk CPSC503 Winter 2007

  28. WS: Distributional Methods • Do not have any thesauri for target language • If you have thesaurus, still • Missing domain-specific (e.g., technical words) • Poor hyponym knowledge (for V) and nothing for Adj and Adv • Difficult to compare senses from different hierarchies • Solution: extract similarity from corpora • Basic idea: two words are similar if they appear in similar contexts CPSC503 Winter 2007

  29. WS Distributional Methods (1) Stop list • Simple Context: feature vector Example: fihow many times wi appeared in the neighborhood of w • More Complex Context: feature matrix aij how many times wi appeared in the neighborhood of wand was related to w by the syntactic relation rj CPSC503 Winter 2007

  30. WS Distributional Methods (2) • More informative values (referred to as weights or measure of association in the literature) • Point-wise Mutual Information • t-test CPSC503 Winter 2007

  31. v  w WS Distributional Methods (3) • Similarity between vectors Not sensitive to extreme values Normalized (weighted) number of overlapping features CPSC503 Winter 2007

  32. WS Distributional Methods (4) • Best combination overall • t-test for weights • Jaccard (or Dice) for vector similarity CPSC503 Winter 2007

  33. Today 22/10 • Word Sense Disambiguation • Word Similarity • Semantic Role Labeling CPSC503 Winter 2007

  34. Semantic Role Labeling Typically framed as a classification problem [Gildea, Jurfsky 2002] • Assign parse tree to input • Find all predicate-bearing words (PropBank, FrameNet) • For each predicate: determine for each synt. constituent which role (if any) it plays with respect to the predicate Common constituent features: predicate, phrase type, head word and its POS, path, voice, linear position…… and many others CPSC503 Winter 2007

  35. Semantic Role Labeling: Example [issued, NP, Examiner, NNP, NPSVPVBD, active, before, …..] CPSC503 Winter 2007

  36. Next Time • Discourse and Dialog • Overview of Chapters 21 and 24 CPSC503 Winter 2007

More Related