CPSC 503 Computational Linguistics

CPSC 503Computational Linguistics Computational Lexical Semantics Lecture 12 Giuseppe Carenini CPSC503 Winter 2007

Today 22/10 Three well-defined Semantic Task • Word Sense Disambiguation • Corpus and Thesaurus • Word Similarity • Thesaurus and Corpus • Semantic Role Labeling CPSC503 Winter 2007

WSD example: table + ?? -> [1-6] The noun "table" has 6 senses in WordNet.1. table, tabular array -- (a set of data …)2. table -- (a piece of furniture …)3. table -- (a piece of furniture with tableware…)4. mesa, table -- (flat tableland …)5. table -- (a company of people …)6. board, table -- (food or meals …) CPSC503 Winter 2007

WSD methods • Machine Learning • Supervised • Unsupervised • Dictionary / Thesaurus (Lesk) CPSC503 Winter 2007

Training Data Machine Learning ((word + context1)  sense1) …… ((word + contextn)  sensen) Classifier (word + context) sense Supervised ML Approaches to WSD CPSC503 Winter 2007

context sense Examples, • One of 8 possible senses for “bass” in WordNet • One of the 2 key distinct senses for “bass” in WordNet Training Data Example ((word + context)  sense)i ..after the soup she had bass with a big salad… CPSC503 Winter 2007

WordNet Bass: music vs. fish The noun ``bass'' has 8 senses in WordNet • bass - (the lowest part of the musical range) • bass, bass part - (the lowest part in polyphonic music) • bass, basso - (an adult male singer with …) • sea bass, bass - (flesh of lean-fleshed saltwater fish of the family Serranidae) • freshwater bass, bass - (any of various North American lean-fleshed ………) • bass, bass voice, basso - (the lowest adult male singing voice) • bass - (the member with the lowest range of a family of musical instruments) • bass -(nontechnical name for any of numerous edible marine and freshwater spiny-finned fishes) CPSC503 Winter 2007

Supervised ML requires a very simple representation for the training data: vectors of feature/value pairs Representations for Context • GOAL: Informative characterization of the window of text surrounding the target word • TASK: Select relevant linguistic information, encode them as a feature vector CPSC503 Winter 2007

Relevant Linguistic Information(1) • Collocational: info about the words that appear in specific positions to the right and left of the target word Typically words and their POS [word in position -n, part-of-speech position -n, … word in position +n, part-of-speech position +n,] Assume a window of +/- 2 from the target • Example text (WSJ) • An electric guitar and bass player stand off to one side not really part of the scene, … [guitar, NN, and, CJC, player, NN, stand, VVB] CPSC503 Winter 2007

Relevant Linguistic Information(2) Co-occurrence: info about the words that occur anywhere in the window regardless of position • Find k content words that most frequently co-occur with target in corpus (for bass: fishing, big, sound, player, fly …, guitar, band)) Vector for one case: [c(fishing), c(big), c(sound), c(player), c(fly), …, c(guitar), c(band)] • Example text (WSJ) • An electric guitar and bass player stand off to one side not really part of the scene, … [0,0,0,1,0,0,0,0,0,0,1,0] CPSC503 Winter 2007

Training Data Examples Let’s assume: bass-music encoded as 0 bass-fish encoded as 1 [0,0,0,1,0,0,0,0,0,0,1,0,0] [guitar, NN, and, CJC, player, NN, stand, VVB, 0] • Inputs to classifiers [a, AT0, sea, CJC, to, PRP, me, PNP, 1] [1,0,0,0,0,0,0,0,0,0,0,0,1] [play, VVB, the, AT0, with, PRP, others, PNP, 0] [……… ] [1,0,0,0,0,0,0,0,0,0,0,1,1] […………………..] [1,1,0,0,0,1,0,0,0,0,0,0] [guitar, NN, and, CJC, could, VM0, be, VVI] CPSC503 Winter 2007

ML for Classifiers • Training Data: • Co-occurrence • Collocational • Naïve Bayes • Decision lists • Decision trees • Neural nets • Support vector machines • Nearest neighbor methods… Machine Learning Classifier CPSC503 Winter 2007

Independence Naïve Bayes CPSC503 Winter 2007

Naïve Bayes: Evaluation Experiment comparing different classifiers [Mooney 96] • Naïve Bayes and Neural Network achieved highest performance • 73% in assigning one of six senses to line • Is this good? • Simplest Baseline: “most frequent sense” • Celing: human inter-annotator agreement • 75%-80% on refined sense distinctions (wordnet) • Closer to 90% for binary distinctions CPSC503 Winter 2007

seeds Small Training Data Machine Learning More Classified Data Classifier More Data Bootstrapping • What if you don’t have enough data to train a system… CPSC503 Winter 2007

Bootstrapping: how to pick the seeds • Hand-labeling (Hearst 1991): • Likely correct • Likely to be prototypical • One sense per collocation (Yarowsky 1995): E.g., bass: play is strongly associated with the music sense whereas fish is strongly associated the fish sense • One Sense Per Discourse: multiple occurrences of word in one discourse tend to have the same sense CPSC503 Winter 2007

Hand-labeling (c1 sense1) …… Vector/cluster Similarity sense (word + vector) Unsupervised Methods [Schutze ’98] Machine Learning (Clustering) Training Data (word + vector)1 …… (word + vector)n K Clusters ci CPSC503 Winter 2007

Agglomerative Clustering • Assign each instance to its own cluster • Repeat • Merge the two clusters that are more similar • Until (specified # of clusters is reached) • If there are too many training instances ->random sampling CPSC503 Winter 2007

Problems • Given these general ML approaches, how many classifiers do I need to perform WSD robustly • One for each ambiguous word in the language • How do you decide what set of tags/labels/senses to use for a given word? • Depends on the application CPSC503 Winter 2007

WDS: Dictionary and Thesaurus Methods Most common: Lesk method • Choose the sense whose dictionary gloss shares most words with the target word’s neighborhood • Exclude stop-words Def: Words in gloss for a sense is called the signature CPSC503 Winter 2007

Lesk: Example Two SENSES for channel S1: (n) channel (a passage for water (or other fluids) to flow through) "the fields were crossed with irrigation channels"; "gutters carried off the rainwater into a series of channels under the street" S2: (n) channel, television channel, TV channel (a television station and its programs) "a satellite TV channel"; "surfing through the channels"; "they offer more than one hundred channels" ….. “ most streets closed to the TV station were flooded because the main channel was clogged by heavy rain .” CPSC503 Winter 2007

Corpus Lesk Best performer • If a corpus with annotated senses is available • For each sense: add all the words in the sentences containing that sense to the signature for that sense CORPUS …… “most streets closed to the TV station were flooded because the main <S1> channel </S1> was clogged by heavy rain. ….. ? CPSC503 Winter 2007

WSD: More Recent Trends • Better ML techniques (e.g., Combining Classifiers) • Combining ML and Lesk • Other Languages • Building better/larger corpora CPSC503 Winter 2007

Today 22/10 • Word Sense Disambiguation • Word Similarity • Semantic Role Labeling CPSC503 Winter 2007

Word Similarity Actually relation between two senses Similarity vs. Relatedness sun vs. moon – mouth vs. food – hot vs. cold Applications? • Thesaurus methods: measure distance in online thesauri (e.g., Wordnet) • Distributional methods: finding if the two words appear in similar contexts CPSC503 Winter 2007

WS: Thesaurus Methods(1) • Path-length based sim on hyper/hypo hierarchies • Information content word similarity (not all edges are equal) probability Information Lowest Common Subsumer CPSC503 Winter 2007

WS: Thesaurus Methods(2) • One of best performers – Jiang-Conrath distance • This is a measure of distance. Reciprocal for similarity! • See also Extended Lesk CPSC503 Winter 2007

WS: Distributional Methods • Do not have any thesauri for target language • If you have thesaurus, still • Missing domain-specific (e.g., technical words) • Poor hyponym knowledge (for V) and nothing for Adj and Adv • Difficult to compare senses from different hierarchies • Solution: extract similarity from corpora • Basic idea: two words are similar if they appear in similar contexts CPSC503 Winter 2007

WS Distributional Methods (1) Stop list • Simple Context: feature vector Example: fihow many times wi appeared in the neighborhood of w • More Complex Context: feature matrix aij how many times wi appeared in the neighborhood of wand was related to w by the syntactic relation rj CPSC503 Winter 2007

WS Distributional Methods (2) • More informative values (referred to as weights or measure of association in the literature) • Point-wise Mutual Information • t-test CPSC503 Winter 2007

v  w WS Distributional Methods (3) • Similarity between vectors Not sensitive to extreme values Normalized (weighted) number of overlapping features CPSC503 Winter 2007

WS Distributional Methods (4) • Best combination overall • t-test for weights • Jaccard (or Dice) for vector similarity CPSC503 Winter 2007

Today 22/10 • Word Sense Disambiguation • Word Similarity • Semantic Role Labeling CPSC503 Winter 2007

Semantic Role Labeling Typically framed as a classification problem [Gildea, Jurfsky 2002] • Assign parse tree to input • Find all predicate-bearing words (PropBank, FrameNet) • For each predicate: determine for each synt. constituent which role (if any) it plays with respect to the predicate Common constituent features: predicate, phrase type, head word and its POS, path, voice, linear position…… and many others CPSC503 Winter 2007

Semantic Role Labeling: Example [issued, NP, Examiner, NNP, NPSVPVBD, active, before, …..] CPSC503 Winter 2007

Next Time • Discourse and Dialog • Overview of Chapters 21 and 24 CPSC503 Winter 2007

CPSC 503 Computational Linguistics