1 / 33

Learning and Inference for Hierarchically Split PCFGs

Learning and Inference for Hierarchically Split PCFGs. Slav Petrov and Dan Klein. The Game of Designing a Grammar. Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98]. The Game of Designing a Grammar.

zamora
Download Presentation

Learning and Inference for Hierarchically Split PCFGs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning and Inference for Hierarchically Split PCFGs Slav Petrov and Dan Klein

  2. The Game of Designing a Grammar • Annotation refines base treebank symbols to improve statistical fit of the grammar • Parent annotation [Johnson ’98]

  3. The Game of Designing a Grammar • Annotation refines base treebank symbols to improve statistical fit of the grammar • Parent annotation [Johnson ’98] • Head lexicalization[Collins ’99, Charniak ’00]

  4. The Game of Designing a Grammar • Annotation refines base treebank symbols to improve statistical fit of the grammar • Parent annotation [Johnson ’98] • Head lexicalization[Collins ’99, Charniak ’00] • Automatic clustering?

  5. Forward X1 X7 X2 X4 X3 X5 X6 . He was right Backward [Matsuzaki et al. ‘05] Learning Latent Annotations EM algorithm: • Brackets are known • Base categories are known • Only induce subcategories Just like Forward-Backward for HMMs.

  6. Limit of computational resources Overview - Hierarchical Training - Adaptive Splitting - Parameter Smoothing

  7. DT-2 DT-3 DT-1 DT-4 Refinement of the DT tag DT

  8. Refinement of the DT tag DT

  9. Hierarchical refinement of the DT tag DT

  10. Hierarchical Estimation Results

  11. Refinement of the , tag • Splitting all categories the same amount is wasteful:

  12. Likelihood with split reversed Likelihood with split Adaptive Splitting • Want to split complex categories more • Idea: split everything, roll back splits which were least useful

  13. Likelihood with split reversed Likelihood with split Adaptive Splitting • Want to split complex categories more • Idea: split everything, roll back splits which were least useful

  14. Adaptive Splitting Results

  15. Number of Phrasal Subcategories

  16. Number of Phrasal Subcategories NP VP PP

  17. Number of Phrasal Subcategories NAC X

  18. Number of Lexical Subcategories POS TO ,

  19. Number of Lexical Subcategories NNP JJ NNS NN

  20. Smoothing • Heavy splitting can lead to overfitting • Idea: Smoothing allows us to pool statistics

  21. Result Overview

  22. Linguistic Candy • Proper Nouns (NNP): • Personal pronouns (PRP):

  23. Linguistic Candy • Relative adverbs (RBR): • Cardinal Numbers (CD):

  24. Inference She heard the noise. Exhaustive parsing: 1 min per sentence

  25. Treebank Coarse grammar Prune Parse Parse NP … VP NP-1 VP-6 NP-17 … VP-31 NP-12 … Refined grammar [Goodman ‘97, Charniak&Johnson ‘05] Coarse-to-Fine Parsing

  26. Hierarchical Pruning < t Consider again the span 5 to 12: coarse: split in two: split in four: split in eight:

  27. G1 G2 G3 G4 G5 G6 DT DT1 DT2 DT1 DT2 DT3 DT4 Learning DT1 DT2 DT3 DT4 DT5 DT6 DT7 DT8 Intermediate Grammars X-Bar=G0 G=

  28. 0(G) 1(G) 2(G) 3(G) 4(G) 5(G) G1 G2 G3 G4 G5 G6 G1 G2 G3 G4 G5 G6 Learning Learning Projection i G Projected Grammars X-Bar=G0 G=

  29. Final Results (Efficiency) • Parsing the development set (1600 sentences) • Berkeley Parser: • 10 min • Implemented in Java • Charniak & Johnson ‘05 Parser • 19 min • Implemented in C

  30. Final Results (Accuracy)

  31. Extensions • Acoustic modeling • Infinite Grammars • Nonparametric Bayesian Learning [Petrov, Pauls & Klein ‘07] [Liang, Petrov, Jordan & Klein ‘07]

  32. Conclusions • Split & Merge Learning • Hierarchical Training • Adaptive Splitting • Parameter Smoothing • Hierarchical Coarse-to-Fine Inference • Projections • Marginalization • Multi-lingual Unlexicalized Parsing

  33. Thank You! http://nlp.cs.berkeley.edu

More Related