Parsing German with Latent Variable Grammars

Parsing German with Latent Variable Grammars Slav Petrov and Dan Klein UC Berkeley

The Game of Designing a Grammar • Annotation refines base treebank symbols to improve statistical fit of the grammar • Parent annotation [Johnson ’98] • Head lexicalization[Collins ’99, Charniak ’00] • Automatic clustering?

Previous Work:Manual Annotation [Klein & Manning ’03] • Manually split categories • NP: subject vs object • DT: determiners vs demonstratives • IN: sentential vs prepositional • Advantages: • Fairly compact grammar • Linguistic motivations • Disadvantages: • Performance leveled out • Manually annotated

[Matsuzaki et. al ’05, Prescher ’05] Previous Work:Automatic Annotation Induction • Advantages: • Automatically learned: Label all nodes with latent variables. Same number k of subcategories for all categories. • Disadvantages: • Grammar gets too large • Most categories are oversplit while others are undersplit.

Overview [Petrov, Barrett, Thibaux & Kleinin ACL’06] • Learning: • Hierarchical Training • Adaptive Splitting • Parameter Smoothing • Inference: • Coarse-To-Fine Decoding • Variational Approximation • German Analysis [Petrov & Klein in NAACL’07]

Forward X1 X7 X2 X4 X3 X5 X6 . He was right Backward Learning Latent Annotations EM algorithm: • Brackets are known • Base categories are known • Only induce subcategories Just like Forward-Backward for HMMs.

Limit of computational resources Starting Point

DT-2 DT-3 DT-1 DT-4 Refinement of the DT tag DT

Refinement of the DT tag DT

Hierarchical Refinement of the DT tag DT

Hierarchical Estimation Results

Refinement of the , tag • Splitting all categories the same amount is wasteful:

Oversplit? The DT tag revisited

Adaptive Splitting • Want to split complex categories more • Idea: split everything, roll back splits which were least useful

Adaptive Splitting • Evaluate loss in likelihood from removing each split = Data likelihood with split reversed Data likelihood with split • No loss in accuracy when 50% of the splits are reversed.

Adaptive Splitting Results

Number of Phrasal Subcategories

Number of Lexical Subcategories

Smoothing • Heavy splitting can lead to overfitting • Idea: Smoothing allows us to pool statistics

Linear Smoothing

Result Overview

Treebank Coarse grammar Prune Parse Parse NP … VP NP-apple NP-1 VP-6 VP-run NP-17 NP-dog … … VP-31 NP-eat NP-12 NP-cat … … Refined grammar Refined grammar [Goodman ‘97, Charniak&Johnson ‘05] Coarse-to-Fine Parsing

Hierarchical Pruning Consider the span 5 to 12: coarse: split in two: split in four: split in eight:

G1 G2 G3 G4 G5 G6 DT DT1 DT2 DT1 DT2 DT3 DT4 Learning DT1 DT2 DT3 DT4 DT5 DT6 DT7 DT8 Intermediate Grammars X-Bar=G0 G=

the the that that this this some some That That this this these these some some That • This • … • … … That That … … … this this … these this … … these … that … that these … … … some some … some some … EM State Drift (DT tag)

0(G) 1(G) 2(G) 3(G) 4(G) 5(G) G1 G2 G3 G4 G5 G6 G1 G2 G3 G4 G5 G6 Learning Learning Projection i G Projected Grammars X-Bar=G0 G=

Bracket Posteriors (after G0)

Bracket Posteriors (after G1)

Bracket Posteriors (Movie) (Final Chart)

Bracket Posteriors (Best Tree)

-2 -1 Parses: Derivations: -2 -1 -1 -2 -1 -1 -2 -1 -1 -2 -1 -1 Parse Selection Computing most likely unsplit tree is NP-hard: • Settle for best derivation. • Rerank n-best list. • Use alternative objective function / Variational Approximation.

Efficiency Results • Berkeley Parser: • 15 min • Implemented in Java • Charniak & Johnson ‘05 Parser • 19 min • Implemented in C

Accuracy Results

Parsing German Shared Task • Two Pass Parsing • Determine constituency structure (F1: 85/94) • Assign grammatical functions • One Pass Approach • Treat categories+grammatical functions as labels

Parsing German Shared Task • Two Pass Parsing • Determine constituency structure • Assign grammatical functions • One Pass Approach • Treat categories+grammatical functions as labels

Development Set Results

Shared Task Results

Part-of-speech splits

Linguistic Candy

Conclusions • Split & Merge Learning • Hierarchical Training • Adaptive Splitting • Parameter Smoothing • Hierarchical Coarse-to-Fine Inference • Projections • Marginalization • Multi-lingual Unlexicalized Parsing

Thank You! Parser is avaliable at http://nlp.cs.berkeley.edu

Parsing German with Latent Variable Grammars