Collective Word Sense Disambiguation

Collective Word Sense Disambiguation David Vickrey Ben Taskar Daphne Koller

Word Sense Disambiguation Clues The electricity plant supplies 500 homes with power. vs. A plant requires water and sunlight to survive. Clues Tricky: That plant produces bottled water.

WSD as Classification • Senses s1, s2, …, sk correspond to classes c1, c2, …, ck • Features: properties of context of word occurrence • Subject or verb of sentence • Any word occurring within 4 words of occurrence • Document: set of features corresponding to an occurrence The electricity plant supplies 500 homes with power.

Simple Approaches • Only features are what words appear in context • Naïve Bayes • Discriminative, e.g. SVM • Problems: • Feature set not rich enough • Data extremely sparse • space occurs 38 times in corpus with 200,000 words

Available Data • WordNet – electronic thesaurus • Words grouped by meaning into synsets • Slightly over 100,000 synsets • For nouns and verbs, hierarchy over synsets Animal Bird Mammal Dog, Hound, Canine Retriever Terrier

Available Data • Around 400,000 word corpus labeled with synsets from WordNet • Sample sentences from WordNet • Very sparse for most words

What Hasn’t Worked • Intuition: context of “dog” similar to context of “retriever” • Use hierarchy to determine possibly useful data • Using cross-validation, learn what data is actually useful • This hasn’t worked out very well

Why? • Lots of parameters (not even counting parameters estimated using MLE) • > 100K for one model, ~ 20K for another • Not much data (400K words) • a, the, and, of, to occur ~ 65K times (together) • Hierarchy may not be very useful • Hand-built; not designed for this task • Features not very expressive • Luke is looking at this more closely using an SVM

Collective WSD Ideas: • Determine senses of all words in a document simultaneously • Allows for richer features • Train on unlabeled data as well as labeled • Lots and lots of unlabeled text available

Model • Variables: • S1,S2, …, Sn – synsets • W1,W2, …, Wn – words, always observed S1 S2 S3 S4 S5 W1 W2 W3 W4 W5

Model • Each synset generated from previous context – size of context a parameter (4) n ∏ P(Wi | Si) * P(Si | Si-3,Si-2,Si-1) P(S,W) = i = 1 P(Si=s | Si-3,Si-2,Si-1) = Z(si-3,si-2,si-1) exp(λs(si-3)+λs(si-2)+λs(si-1)+λs) P(W) = Σ P(S,W)

Learning • Two sets of parameters • P(Wi | Si) – Given current estimates of marginals P(Si), expected counts • λs(s’) – For s’  Domain(Si-1), s  Domain(Si), gradient descent on log likelihood gives: [ P(w,si-3,si-2,s’,s) – P(w,si-3,si-2,s’) * P(s | si-3,si-2,s’)] λs(s’) +=Σ Si-3,Si-2

Efficiency • Only need to calculate marginals over contexts • Forwards-backwards • Issue: some words have many possible synsets (40-50) – want very fast inference • Possibly prune values?

WordNet and Synsets • Model uses WordNet to determine domain of Si • Synset information should be more reliable • This allows us learn without any labeled data • Consider synsets {eagle,hawk}, {eagle (golf shot)}, and {hawk(to sell)} • Since parameters depend only on synset, even without labeled data, can find correct clustering

Richer Features • Heuristic: “One sense per discourse” = usually, within a document any given word only takes one of its possible senses • Can capture this using long-range links • Could assume each word independent of all occurrences besides the ones immediately before and after • Or, could use approximate inference (Kikuchi)

Richer Features • Can reduce feature sparsity using hierarchy (e.g., replace all occurrences of “dog” and “cat” with “animal”) • Need collective classification to do this • Could add “global” hidden variables to try to capture document subject

Advanced Parameters • Lots of parameters • Regularization likely helpful • Could tie parameters together based on similarity in the WordNet hierarchy • Ties in what I was working on before • More data in this situation (unlabeled)

Experiments • Soon

Collective Word Sense Disambiguation

Collective Word Sense Disambiguation

Presentation Transcript

Advances in Word Sense Disambiguation

Word Sense Disambiguation

Word Sense Disambiguation

Word Sense Disambiguation

Word Relations and Word Sense Disambiguation

Word Sense Disambiguation (WSD)

Word Sense Disambiguation

Word Relations and Word Sense Disambiguation

Word Sense Disambiguation

Coarse-grained Word Sense Disambiguation

Unsupervised Word Sense Disambiguation

Word Sense Disambiguation in Queries

Word Sense Disambiguation

Word Sense Disambiguation

Word Sense Disambiguation

Word Sense Disambiguation

Word Sense Disambiguation

Word Sense Disambiguation

Word Sense Disambiguation