180 likes | 303 Views
Collective Word Sense Disambiguation. David Vickrey Ben Taskar Daphne Koller. Word Sense Disambiguation. Clues. The electricity plant supplies 500 homes with power . vs. A plant requires water and sunlight to survive. Clues. Tricky:. That plant produces bottled water.
E N D
Collective Word Sense Disambiguation David Vickrey Ben Taskar Daphne Koller
Word Sense Disambiguation Clues The electricity plant supplies 500 homes with power. vs. A plant requires water and sunlight to survive. Clues Tricky: That plant produces bottled water.
WSD as Classification • Senses s1, s2, …, sk correspond to classes c1, c2, …, ck • Features: properties of context of word occurrence • Subject or verb of sentence • Any word occurring within 4 words of occurrence • Document: set of features corresponding to an occurrence The electricity plant supplies 500 homes with power.
Simple Approaches • Only features are what words appear in context • Naïve Bayes • Discriminative, e.g. SVM • Problems: • Feature set not rich enough • Data extremely sparse • space occurs 38 times in corpus with 200,000 words
Available Data • WordNet – electronic thesaurus • Words grouped by meaning into synsets • Slightly over 100,000 synsets • For nouns and verbs, hierarchy over synsets Animal Bird Mammal Dog, Hound, Canine Retriever Terrier
Available Data • Around 400,000 word corpus labeled with synsets from WordNet • Sample sentences from WordNet • Very sparse for most words
What Hasn’t Worked • Intuition: context of “dog” similar to context of “retriever” • Use hierarchy to determine possibly useful data • Using cross-validation, learn what data is actually useful • This hasn’t worked out very well
Why? • Lots of parameters (not even counting parameters estimated using MLE) • > 100K for one model, ~ 20K for another • Not much data (400K words) • a, the, and, of, to occur ~ 65K times (together) • Hierarchy may not be very useful • Hand-built; not designed for this task • Features not very expressive • Luke is looking at this more closely using an SVM
Collective WSD Ideas: • Determine senses of all words in a document simultaneously • Allows for richer features • Train on unlabeled data as well as labeled • Lots and lots of unlabeled text available
Model • Variables: • S1,S2, …, Sn – synsets • W1,W2, …, Wn – words, always observed S1 S2 S3 S4 S5 W1 W2 W3 W4 W5
Model • Each synset generated from previous context – size of context a parameter (4) n ∏ P(Wi | Si) * P(Si | Si-3,Si-2,Si-1) P(S,W) = i = 1 P(Si=s | Si-3,Si-2,Si-1) = Z(si-3,si-2,si-1) exp(λs(si-3)+λs(si-2)+λs(si-1)+λs) P(W) = Σ P(S,W)
Learning • Two sets of parameters • P(Wi | Si) – Given current estimates of marginals P(Si), expected counts • λs(s’) – For s’ Domain(Si-1), s Domain(Si), gradient descent on log likelihood gives: [ P(w,si-3,si-2,s’,s) – P(w,si-3,si-2,s’) * P(s | si-3,si-2,s’)] λs(s’) +=Σ Si-3,Si-2
Efficiency • Only need to calculate marginals over contexts • Forwards-backwards • Issue: some words have many possible synsets (40-50) – want very fast inference • Possibly prune values?
WordNet and Synsets • Model uses WordNet to determine domain of Si • Synset information should be more reliable • This allows us learn without any labeled data • Consider synsets {eagle,hawk}, {eagle (golf shot)}, and {hawk(to sell)} • Since parameters depend only on synset, even without labeled data, can find correct clustering
Richer Features • Heuristic: “One sense per discourse” = usually, within a document any given word only takes one of its possible senses • Can capture this using long-range links • Could assume each word independent of all occurrences besides the ones immediately before and after • Or, could use approximate inference (Kikuchi)
Richer Features • Can reduce feature sparsity using hierarchy (e.g., replace all occurrences of “dog” and “cat” with “animal”) • Need collective classification to do this • Could add “global” hidden variables to try to capture document subject
Advanced Parameters • Lots of parameters • Regularization likely helpful • Could tie parameters together based on similarity in the WordNet hierarchy • Ties in what I was working on before • More data in this situation (unlabeled)
Experiments • Soon