210 likes | 280 Views
Learning a joint model of word sense and syntactic preference. Galen Andrew and Teg Grenager NLP Lunch December 3, 2003. Motivation.
E N D
Learning a joint model of word sense and syntactic preference Galen Andrew and Teg Grenager NLP Lunch December 3, 2003
Motivation • Divide and conquer: NLP has divided the problem of language processing into small problems (tagging, parsing, WSD, anaphora resolution, information extraction…) • Traditionally, we build separate models for each problem • But linguistic phenomena are correlated! • In some cases, a joint model can better represent the phenomena • We may be better able to perform one task if we have a joint model and can use many kinds of evidence
Motivation • In particular, syntactic and semantic properties of language are (of course!) very correlated • For example, semantic information (word sense, coresolution, etc.) is useful when doing syntactic processing (tagging, parsing, movement, etc.) and vice versa • Evidence that humans use this information (e.g., Clifton et al. 1984, Ferreira&McClure 1997, Garnsey et al. 1997) • Evidence that it is useful in NLP (e.g., Yarowsky 2000, Lin 1997, Bikel 2000)
Verb Sense and Subcat • We’ve chosen to focus on modelling two specific phenomena: verb sense and verb subcategorization preference. • Roland and Jurafsky (1998) demonstrate that models which condition on verb sense are better able to predict verb subcategorization • Others (e.g., Yarowski, Lin) have shown that models that condition on syntactic information are better able to predict word sense • We believe that a joint model of verb sense and subcategorization may be more accurate than separate models on either task
Example • The word “admit” has 8 senses in WordNet, with different distributions over subcategories: Sense Definition Subcategorization Example 1 Acknowledge Somebody admits that something. (Sfin) Somebody admits something. (NP) The defendant admitted that he had lied. He admitted his guilt. 2 Allow in Somebody admits somebody. (NP) Somebody admits someone (in)to somewhere. (NP PP) I will admit only those with passes. The student was admitted. 6 Give access to Something admits to somewhere. (PP) The main entrance admits to the foyer.
Lack of Training Data • To learn a joint model over verb sense and subcategorization preference, we’d ideally have a (large) dataset marked for both • No such dataset exists (although parts of the Brown corpus are in Semcor and in the PTB, it is small and not aligned) • However, we have some datasets marked for sense (Senseval, Semcor), and others that can easily be marked for subcategory (PTB) • We can think of this as one big corpus with missing data
Lack of Training Data seq bow sense subcat x x x x x x Semcor Data(marked for sense) x x x x x x x x x x x x x x x x x x Penn Treebank Data(marked for subcat) x x x x x x x x x x x x
EM to the Rescue • How do people usually deal with model parameter estimation when there is missing data? The expectation-maximization algorithm. • Big idea: it’s easy to • E: fill in missing data if you have a good model, and • M: compute maximum likelihood model parameters if you have complete data • So you initialize somehow, and then loop over the above two steps until convergence
EM to the Rescue • More formally, for data x, missing data z, and parameters • E-step: For each instance i, set • M-step: Set
The Model Subcat Sense Seq BOW
The Model: E-step Subcat Sense observed unobserved query Seq BOW
The Model: E-step Subcat Sense observed unobserved query Seq BOW
The Model: M-step Subcat Sense Seq BOW Deterministic
The Model: M-step Prior, estimated from counts Subcat Sense Seq BOW
The Model: M-step Estimated from counts Subcat Sense Seq BOW
The Model: M-step Subcat Sense Multinomial NB Model Seq BOW
The Model: M-step Subcat Sense Encoded as PCFG Grammar Only computed once Seq BOW
Subcategory Grammars • In order to represent P(seq|subcat) we needed to learn separate grammars/lexicons for each subcategory of the target verb • When reading in PTB trees, we first make a separate copy of the tree for each verb. • Then for each tree, we mark the selected verb for subcategory (using Tgrep expressions) and propagate the markings to the top of the tree. • Then trees are annotated (tag split, for accuracy) and binarized, and we read off grammars and lexicon • Thus at parse time, each root symbol must parse some verb to its specified subcategory.
Model Testing • Once we’ve trained a model with EM, we can use it to predict sense and/or subcat in a completely unmarked instance • For example, to infer sense given only the sequence (and bow): • Infering subcat given only the sequence is similar
Results • None yet, but we should have them soon
Future Work • More features, and a more complex model • Learn separate distributions over words in the VP and outside of the VP, conditioned on sense • Learn a distribution over words contained in particular argument and adjunct conditioned on sense