310 likes | 403 Views
Learning Accurate, Compact, and Interpretable Tree Annotation. Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Milo š Ercegovčević. Outline. Introduction EM algorithm Latent Grammars Motivation Learning Latent PCFG
E N D
Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš Ercegovčević
Outline • Introduction • EM algorithm • Latent Grammars • Motivation • Learning Latent PCFG • Split-Merge Adaptation • Efficient inference with Latent Grammars • Pruning in Multilevel Coarse-to-Fine parsing • Parse Selection
Introduction : EM Algorithm • Iterative algorithm for finding MLE or MAP estimates of parameters in statistical models • X – observed data; Z – set of latent variables • Θ – a vector of unknown parametes • Likelihood function: • MLE of the marginal likelihood : • However this quantity is intractable • Often we don’t know both Z and Θ
Introduction : EM Algorithm • Find the MLE of the marginal likelihood by iteratively applying two steps: • Expectation step (E-step): • Calculate Z under current Θ • Maximization step (M-step): • Find Θ that maximizes the quantity
Latent PCFG • Standard coarse Treebank Tree • Baseline for parsing F1 72.6
Latent PCFG • Parent annotated trees [Johnson ’98], [Klein & Manning ’03] • F1 86.3
Latent PCFG • Head lexicalized [Collins ’99, Charniak ’00]trees • F1 88.6
Latent PCFG • Automatically clustered categories with F1 86.7 [Matsuzaki et al. ’05] • Same number of subcategories for all categories
Latent PCFG • At each step split the categories into two sets. • After 6 iterations number of subcategories is 64 • Initialize EM with the results of the smaller grammar
Forward X1 X7 X2 X4 X3 X5 X6 . He was right Backward Learning Latent PCFG • Induce subcategories • Like forward-backward for HMMs • Fixed brackets • S
Learning Latent Grammar • Inside-Outside probabilities
Learning Latent Grammar • Expectation step (E-step): • Maximization step (M-step):
Latent Grammar : Adaptive splitting • Want to split more according to the data • Solution: Split everything then merge by the loss • Without loss in Accuracy
Latent Grammar : Adaptive splitting • The likelihood of data for tree T and sentence w: • Then for two annotations the overall loss can be estimated as:
Number of Phrasal Subcategories NP VP PP
Number of Lexical Subcategories POS TO ,
Number of Lexical Subcategories RB VBx IN DT
Number of Lexical Subcategories NNP JJ NNS NN
Efficient inference with Latent Grammars • Latent Grammar with 91.2 F1 score on Dev Set (1600 sentences) WSJ • Training time 1621: more than a minute per sentence • For usage in real-word applications this is to slow • Improve on inference: • Hierarchical Pruning • Parse Selection
G1 G2 G3 G4 G5 G6 DT DT1 DT2 DT1 DT2 DT3 DT4 Learning DT1 DT2 DT3 DT4 DT5 DT6 DT7 DT8 Intermediate Grammars X-Bar=G0 G=
0(G) 1(G) 2(G) 3(G) 4(G) 5(G) G1 G2 G3 G4 G5 G6 G1 G2 G3 G4 G5 G6 Learning Learning Projection i G Projected Grammars X-Bar=G0 G=
… S NP VP S1 NP1 VP1 0.20 S1 NP1 VP2 0.12 S1 NP2 VP1 0.02 S1 NP2 VP2 0.03 S2 NP1 VP1 0.11 S2 NP1 VP2 0.05 S2 NP2 VP1 0.08 S2 NP2 VP2 0.12 Rules in G Rules in (G) … Treebank Infinite tree distribution Estimating Grammars 0.56
Hierarchical Pruning Consider the span: coarse: split in two: split in four: split in eight:
Parse Selection • Given a sentence w and a split PCFG grammar G select the best parse that minimize our beliefs: • Intractable: we cannot generate all the TT
Parse Selection • Possible solutions • best derivation • generate n-best parses and re-rank them • samplingderivations of the grammar • select the minimum risk candidate based on loss function of posterior marginals:
References • S. Petrov, L. Barrett, R. Thibaux, D Klein. Learning Accurate, Compact, and Interpretable Tree Annotation, COLING-ACL 2006 slides. • S. Petrov and D. Klein, NACL Improved Inference for Unlexicalized Parsing : 2007 slides. • S. Petrov, L. Barrett, R. Thibaux, and D. Klein. 2006. Learning accurate, compact, and interpretable tree annotation. In COLING-ACL ’06, pages 443–440. • S. Petrov and D. Klein. 2007. Improved Inference for Unlexicalized Parsing . In NACL ’06. • T. Matsuzaki, Y. Miyao, and J. Tsujii. 2005. Probabilistic CFG with latent annotations. In ACL ’05, pages 75–82.