330 likes | 451 Views
Learning and Inference for Hierarchically Split PCFGs. Slav Petrov and Dan Klein. The Game of Designing a Grammar. Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98]. The Game of Designing a Grammar.
E N D
Learning and Inference for Hierarchically Split PCFGs Slav Petrov and Dan Klein
The Game of Designing a Grammar • Annotation refines base treebank symbols to improve statistical fit of the grammar • Parent annotation [Johnson ’98]
The Game of Designing a Grammar • Annotation refines base treebank symbols to improve statistical fit of the grammar • Parent annotation [Johnson ’98] • Head lexicalization[Collins ’99, Charniak ’00]
The Game of Designing a Grammar • Annotation refines base treebank symbols to improve statistical fit of the grammar • Parent annotation [Johnson ’98] • Head lexicalization[Collins ’99, Charniak ’00] • Automatic clustering?
Forward X1 X7 X2 X4 X3 X5 X6 . He was right Backward [Matsuzaki et al. ‘05] Learning Latent Annotations EM algorithm: • Brackets are known • Base categories are known • Only induce subcategories Just like Forward-Backward for HMMs.
Limit of computational resources Overview - Hierarchical Training - Adaptive Splitting - Parameter Smoothing
DT-2 DT-3 DT-1 DT-4 Refinement of the DT tag DT
Refinement of the , tag • Splitting all categories the same amount is wasteful:
Likelihood with split reversed Likelihood with split Adaptive Splitting • Want to split complex categories more • Idea: split everything, roll back splits which were least useful
Likelihood with split reversed Likelihood with split Adaptive Splitting • Want to split complex categories more • Idea: split everything, roll back splits which were least useful
Number of Phrasal Subcategories NP VP PP
Number of Lexical Subcategories POS TO ,
Number of Lexical Subcategories NNP JJ NNS NN
Smoothing • Heavy splitting can lead to overfitting • Idea: Smoothing allows us to pool statistics
Linguistic Candy • Proper Nouns (NNP): • Personal pronouns (PRP):
Linguistic Candy • Relative adverbs (RBR): • Cardinal Numbers (CD):
Inference She heard the noise. Exhaustive parsing: 1 min per sentence
Treebank Coarse grammar Prune Parse Parse NP … VP NP-1 VP-6 NP-17 … VP-31 NP-12 … Refined grammar [Goodman ‘97, Charniak&Johnson ‘05] Coarse-to-Fine Parsing
Hierarchical Pruning < t Consider again the span 5 to 12: coarse: split in two: split in four: split in eight:
G1 G2 G3 G4 G5 G6 DT DT1 DT2 DT1 DT2 DT3 DT4 Learning DT1 DT2 DT3 DT4 DT5 DT6 DT7 DT8 Intermediate Grammars X-Bar=G0 G=
0(G) 1(G) 2(G) 3(G) 4(G) 5(G) G1 G2 G3 G4 G5 G6 G1 G2 G3 G4 G5 G6 Learning Learning Projection i G Projected Grammars X-Bar=G0 G=
Final Results (Efficiency) • Parsing the development set (1600 sentences) • Berkeley Parser: • 10 min • Implemented in Java • Charniak & Johnson ‘05 Parser • 19 min • Implemented in C
Extensions • Acoustic modeling • Infinite Grammars • Nonparametric Bayesian Learning [Petrov, Pauls & Klein ‘07] [Liang, Petrov, Jordan & Klein ‘07]
Conclusions • Split & Merge Learning • Hierarchical Training • Adaptive Splitting • Parameter Smoothing • Hierarchical Coarse-to-Fine Inference • Projections • Marginalization • Multi-lingual Unlexicalized Parsing
Thank You! http://nlp.cs.berkeley.edu