420 likes | 540 Views
Parsing German with Latent Variable Grammars. Slav Petrov and Dan Klein UC Berkeley. The Game of Designing a Grammar. Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98] Head lexicalization [Collins ’99, Charniak ’00]
E N D
Parsing German with Latent Variable Grammars Slav Petrov and Dan Klein UC Berkeley
The Game of Designing a Grammar • Annotation refines base treebank symbols to improve statistical fit of the grammar • Parent annotation [Johnson ’98] • Head lexicalization[Collins ’99, Charniak ’00] • Automatic clustering?
Previous Work:Manual Annotation [Klein & Manning ’03] • Manually split categories • NP: subject vs object • DT: determiners vs demonstratives • IN: sentential vs prepositional • Advantages: • Fairly compact grammar • Linguistic motivations • Disadvantages: • Performance leveled out • Manually annotated
[Matsuzaki et. al ’05, Prescher ’05] Previous Work:Automatic Annotation Induction • Advantages: • Automatically learned: Label all nodes with latent variables. Same number k of subcategories for all categories. • Disadvantages: • Grammar gets too large • Most categories are oversplit while others are undersplit.
Overview [Petrov, Barrett, Thibaux & Kleinin ACL’06] • Learning: • Hierarchical Training • Adaptive Splitting • Parameter Smoothing • Inference: • Coarse-To-Fine Decoding • Variational Approximation • German Analysis [Petrov & Klein in NAACL’07]
Forward X1 X7 X2 X4 X3 X5 X6 . He was right Backward Learning Latent Annotations EM algorithm: • Brackets are known • Base categories are known • Only induce subcategories Just like Forward-Backward for HMMs.
Limit of computational resources Starting Point
DT-2 DT-3 DT-1 DT-4 Refinement of the DT tag DT
Refinement of the , tag • Splitting all categories the same amount is wasteful:
Oversplit? The DT tag revisited
Adaptive Splitting • Want to split complex categories more • Idea: split everything, roll back splits which were least useful
Adaptive Splitting • Want to split complex categories more • Idea: split everything, roll back splits which were least useful
Adaptive Splitting • Evaluate loss in likelihood from removing each split = Data likelihood with split reversed Data likelihood with split • No loss in accuracy when 50% of the splits are reversed.
Smoothing • Heavy splitting can lead to overfitting • Idea: Smoothing allows us to pool statistics
Treebank Coarse grammar Prune Parse Parse NP … VP NP-apple NP-1 VP-6 VP-run NP-17 NP-dog … … VP-31 NP-eat NP-12 NP-cat … … Refined grammar Refined grammar [Goodman ‘97, Charniak&Johnson ‘05] Coarse-to-Fine Parsing
Hierarchical Pruning Consider the span 5 to 12: coarse: split in two: split in four: split in eight:
G1 G2 G3 G4 G5 G6 DT DT1 DT2 DT1 DT2 DT3 DT4 Learning DT1 DT2 DT3 DT4 DT5 DT6 DT7 DT8 Intermediate Grammars X-Bar=G0 G=
the the that that this this some some That That this this these these some some That • This • … • … … That That … … … this this … these this … … these … that … that these … … … some some … some some … EM State Drift (DT tag)
0(G) 1(G) 2(G) 3(G) 4(G) 5(G) G1 G2 G3 G4 G5 G6 G1 G2 G3 G4 G5 G6 Learning Learning Projection i G Projected Grammars X-Bar=G0 G=
Bracket Posteriors (after G0)
Bracket Posteriors (Movie) (Final Chart)
-2 -1 Parses: Derivations: -2 -1 -1 -2 -1 -1 -2 -1 -1 -2 -1 -1 Parse Selection Computing most likely unsplit tree is NP-hard: • Settle for best derivation. • Rerank n-best list. • Use alternative objective function / Variational Approximation.
Efficiency Results • Berkeley Parser: • 15 min • Implemented in Java • Charniak & Johnson ‘05 Parser • 19 min • Implemented in C
Parsing German Shared Task • Two Pass Parsing • Determine constituency structure (F1: 85/94) • Assign grammatical functions • One Pass Approach • Treat categories+grammatical functions as labels
Parsing German Shared Task • Two Pass Parsing • Determine constituency structure • Assign grammatical functions • One Pass Approach • Treat categories+grammatical functions as labels
Conclusions • Split & Merge Learning • Hierarchical Training • Adaptive Splitting • Parameter Smoothing • Hierarchical Coarse-to-Fine Inference • Projections • Marginalization • Multi-lingual Unlexicalized Parsing
Thank You! Parser is avaliable at http://nlp.cs.berkeley.edu