1 / 21

Learning a joint model of word sense and syntactic preference

Learning a joint model of word sense and syntactic preference. Galen Andrew and Teg Grenager NLP Lunch December 3, 2003. Motivation.

sahara
Download Presentation

Learning a joint model of word sense and syntactic preference

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning a joint model of word sense and syntactic preference Galen Andrew and Teg Grenager NLP Lunch December 3, 2003

  2. Motivation • Divide and conquer: NLP has divided the problem of language processing into small problems (tagging, parsing, WSD, anaphora resolution, information extraction…) • Traditionally, we build separate models for each problem • But linguistic phenomena are correlated! • In some cases, a joint model can better represent the phenomena • We may be better able to perform one task if we have a joint model and can use many kinds of evidence

  3. Motivation • In particular, syntactic and semantic properties of language are (of course!) very correlated • For example, semantic information (word sense, coresolution, etc.) is useful when doing syntactic processing (tagging, parsing, movement, etc.) and vice versa • Evidence that humans use this information (e.g., Clifton et al. 1984, Ferreira&McClure 1997, Garnsey et al. 1997) • Evidence that it is useful in NLP (e.g., Yarowsky 2000, Lin 1997, Bikel 2000)

  4. Verb Sense and Subcat • We’ve chosen to focus on modelling two specific phenomena: verb sense and verb subcategorization preference. • Roland and Jurafsky (1998) demonstrate that models which condition on verb sense are better able to predict verb subcategorization • Others (e.g., Yarowski, Lin) have shown that models that condition on syntactic information are better able to predict word sense • We believe that a joint model of verb sense and subcategorization may be more accurate than separate models on either task

  5. Example • The word “admit” has 8 senses in WordNet, with different distributions over subcategories: Sense Definition Subcategorization Example 1 Acknowledge Somebody admits that something. (Sfin) Somebody admits something. (NP) The defendant admitted that he had lied. He admitted his guilt. 2 Allow in Somebody admits somebody. (NP) Somebody admits someone (in)to somewhere. (NP PP) I will admit only those with passes. The student was admitted. 6 Give access to Something admits to somewhere. (PP) The main entrance admits to the foyer.

  6. Lack of Training Data • To learn a joint model over verb sense and subcategorization preference, we’d ideally have a (large) dataset marked for both • No such dataset exists (although parts of the Brown corpus are in Semcor and in the PTB, it is small and not aligned) • However, we have some datasets marked for sense (Senseval, Semcor), and others that can easily be marked for subcategory (PTB) • We can think of this as one big corpus with missing data

  7. Lack of Training Data seq bow sense subcat x x x x x x Semcor Data(marked for sense) x x x x x x x x x x x x x x x x x x Penn Treebank Data(marked for subcat) x x x x x x x x x x x x

  8. EM to the Rescue • How do people usually deal with model parameter estimation when there is missing data? The expectation-maximization algorithm. • Big idea: it’s easy to • E: fill in missing data if you have a good model, and • M: compute maximum likelihood model parameters if you have complete data • So you initialize somehow, and then loop over the above two steps until convergence

  9. EM to the Rescue • More formally, for data x, missing data z, and parameters • E-step: For each instance i, set • M-step: Set

  10. The Model Subcat Sense Seq BOW

  11. The Model: E-step Subcat Sense observed unobserved query Seq BOW

  12. The Model: E-step Subcat Sense observed unobserved query Seq BOW

  13. The Model: M-step Subcat Sense Seq BOW Deterministic

  14. The Model: M-step Prior, estimated from counts Subcat Sense Seq BOW

  15. The Model: M-step Estimated from counts Subcat Sense Seq BOW

  16. The Model: M-step Subcat Sense Multinomial NB Model Seq BOW

  17. The Model: M-step Subcat Sense Encoded as PCFG Grammar Only computed once Seq BOW

  18. Subcategory Grammars • In order to represent P(seq|subcat) we needed to learn separate grammars/lexicons for each subcategory of the target verb • When reading in PTB trees, we first make a separate copy of the tree for each verb. • Then for each tree, we mark the selected verb for subcategory (using Tgrep expressions) and propagate the markings to the top of the tree. • Then trees are annotated (tag split, for accuracy) and binarized, and we read off grammars and lexicon • Thus at parse time, each root symbol must parse some verb to its specified subcategory.

  19. Model Testing • Once we’ve trained a model with EM, we can use it to predict sense and/or subcat in a completely unmarked instance • For example, to infer sense given only the sequence (and bow): • Infering subcat given only the sequence is similar

  20. Results • None yet, but we should have them soon

  21. Future Work • More features, and a more complex model • Learn separate distributions over words in the VP and outside of the VP, conditioned on sense • Learn a distribution over words contained in particular argument and adjunct conditioned on sense

More Related