Learning and Inference for Hierarchically Split PCFGs

Learning and Inference for Hierarchically Split PCFGs Slav Petrov and Dan Klein

The Game of Designing a Grammar • Annotation refines base treebank symbols to improve statistical fit of the grammar • Parent annotation [Johnson ’98]

The Game of Designing a Grammar • Annotation refines base treebank symbols to improve statistical fit of the grammar • Parent annotation [Johnson ’98] • Head lexicalization[Collins ’99, Charniak ’00]

The Game of Designing a Grammar • Annotation refines base treebank symbols to improve statistical fit of the grammar • Parent annotation [Johnson ’98] • Head lexicalization[Collins ’99, Charniak ’00] • Automatic clustering?

Forward X1 X7 X2 X4 X3 X5 X6 . He was right Backward [Matsuzaki et al. ‘05] Learning Latent Annotations EM algorithm: • Brackets are known • Base categories are known • Only induce subcategories Just like Forward-Backward for HMMs.

Limit of computational resources Overview - Hierarchical Training - Adaptive Splitting - Parameter Smoothing

DT-2 DT-3 DT-1 DT-4 Refinement of the DT tag DT

Refinement of the DT tag DT

Hierarchical refinement of the DT tag DT

Hierarchical Estimation Results

Refinement of the , tag • Splitting all categories the same amount is wasteful:

Likelihood with split reversed Likelihood with split Adaptive Splitting • Want to split complex categories more • Idea: split everything, roll back splits which were least useful

Adaptive Splitting Results

Number of Phrasal Subcategories

Number of Phrasal Subcategories NP VP PP

Number of Phrasal Subcategories NAC X

Number of Lexical Subcategories POS TO ,

Number of Lexical Subcategories NNP JJ NNS NN

Smoothing • Heavy splitting can lead to overfitting • Idea: Smoothing allows us to pool statistics

Result Overview

Linguistic Candy • Proper Nouns (NNP): • Personal pronouns (PRP):

Linguistic Candy • Relative adverbs (RBR): • Cardinal Numbers (CD):

Inference She heard the noise. Exhaustive parsing: 1 min per sentence

Treebank Coarse grammar Prune Parse Parse NP … VP NP-1 VP-6 NP-17 … VP-31 NP-12 … Refined grammar [Goodman ‘97, Charniak&Johnson ‘05] Coarse-to-Fine Parsing

Hierarchical Pruning < t Consider again the span 5 to 12: coarse: split in two: split in four: split in eight:

G1 G2 G3 G4 G5 G6 DT DT1 DT2 DT1 DT2 DT3 DT4 Learning DT1 DT2 DT3 DT4 DT5 DT6 DT7 DT8 Intermediate Grammars X-Bar=G0 G=

0(G) 1(G) 2(G) 3(G) 4(G) 5(G) G1 G2 G3 G4 G5 G6 G1 G2 G3 G4 G5 G6 Learning Learning Projection i G Projected Grammars X-Bar=G0 G=

Final Results (Efficiency) • Parsing the development set (1600 sentences) • Berkeley Parser: • 10 min • Implemented in Java • Charniak & Johnson ‘05 Parser • 19 min • Implemented in C

Final Results (Accuracy)

Extensions • Acoustic modeling • Infinite Grammars • Nonparametric Bayesian Learning [Petrov, Pauls & Klein ‘07] [Liang, Petrov, Jordan & Klein ‘07]

Conclusions • Split & Merge Learning • Hierarchical Training • Adaptive Splitting • Parameter Smoothing • Hierarchical Coarse-to-Fine Inference • Projections • Marginalization • Multi-lingual Unlexicalized Parsing

Thank You! http://nlp.cs.berkeley.edu

Learning and Inference for Hierarchically Split PCFGs