1 / 31

Learning Accurate, Compact, and Interpretable Tree Annotation

Learning Accurate, Compact, and Interpretable Tree Annotation. Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Milo š Ercegovčević. Outline. Introduction EM algorithm Latent Grammars Motivation Learning Latent PCFG

marinel
Download Presentation

Learning Accurate, Compact, and Interpretable Tree Annotation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš Ercegovčević

  2. Outline • Introduction • EM algorithm • Latent Grammars • Motivation • Learning Latent PCFG • Split-Merge Adaptation • Efficient inference with Latent Grammars • Pruning in Multilevel Coarse-to-Fine parsing • Parse Selection

  3. Introduction : EM Algorithm • Iterative algorithm for finding MLE or MAP estimates of parameters in statistical models • X – observed data; Z – set of latent variables • Θ – a vector of unknown parametes • Likelihood function: • MLE of the marginal likelihood : • However this quantity is intractable • Often we don’t know both Z and Θ

  4. Introduction : EM Algorithm • Find the MLE of the marginal likelihood by iteratively applying two steps: • Expectation step (E-step): • Calculate Z under current Θ • Maximization step (M-step): • Find Θ that maximizes the quantity

  5. Latent PCFG • Standard coarse Treebank Tree • Baseline for parsing F1 72.6

  6. Latent PCFG • Parent annotated trees [Johnson ’98], [Klein & Manning ’03] • F1 86.3

  7. Latent PCFG • Head lexicalized [Collins ’99, Charniak ’00]trees • F1 88.6

  8. Latent PCFG • Automatically clustered categories with F1 86.7 [Matsuzaki et al. ’05] • Same number of subcategories for all categories

  9. Latent PCFG • At each step split the categories into two sets. • After 6 iterations number of subcategories is 64 • Initialize EM with the results of the smaller grammar

  10. Forward X1 X7 X2 X4 X3 X5 X6 . He was right Backward Learning Latent PCFG • Induce subcategories • Like forward-backward for HMMs • Fixed brackets • S

  11. Learning Latent Grammar • Inside-Outside probabilities

  12. Learning Latent Grammar • Expectation step (E-step): • Maximization step (M-step):

  13. Latent Grammar : Adaptive splitting • Want to split more according to the data • Solution: Split everything then merge by the loss • Without loss in Accuracy

  14. Latent Grammar : Adaptive splitting • The likelihood of data for tree T and sentence w: • Then for two annotations the overall loss can be estimated as:

  15. Number of Phrasal Subcategories

  16. Number of Phrasal Subcategories NP VP PP

  17. Number of Phrasal Subcategories NAC X

  18. Number of Lexical Subcategories POS TO ,

  19. Number of Lexical Subcategories RB VBx IN DT

  20. Number of Lexical Subcategories NNP JJ NNS NN

  21. Latent Grammar : Results

  22. Efficient inference with Latent Grammars • Latent Grammar with 91.2 F1 score on Dev Set (1600 sentences) WSJ • Training time 1621: more than a minute per sentence • For usage in real-word applications this is to slow • Improve on inference: • Hierarchical Pruning • Parse Selection

  23. G1 G2 G3 G4 G5 G6 DT DT1 DT2 DT1 DT2 DT3 DT4 Learning DT1 DT2 DT3 DT4 DT5 DT6 DT7 DT8 Intermediate Grammars X-Bar=G0 G=

  24. 0(G) 1(G) 2(G) 3(G) 4(G) 5(G) G1 G2 G3 G4 G5 G6 G1 G2 G3 G4 G5 G6 Learning Learning Projection i G Projected Grammars X-Bar=G0 G=

  25. S  NP VP S1  NP1 VP1 0.20 S1  NP1 VP2 0.12 S1  NP2 VP1 0.02 S1  NP2 VP2 0.03 S2  NP1 VP1 0.11 S2  NP1 VP2 0.05 S2  NP2 VP1 0.08 S2  NP2 VP2 0.12 Rules in G Rules in (G) … Treebank Infinite tree distribution Estimating Grammars 0.56

  26. Hierarchical Pruning Consider the span: coarse: split in two: split in four: split in eight:

  27. Parse Selection • Given a sentence w and a split PCFG grammar G select the best parse that minimize our beliefs: • Intractable: we cannot generate all the TT

  28. Parse Selection • Possible solutions • best derivation • generate n-best parses and re-rank them • samplingderivations of the grammar • select the minimum risk candidate based on loss function of posterior marginals:

  29. Results

  30. Thank You!

  31. References • S. Petrov, L. Barrett, R. Thibaux, D Klein. Learning Accurate, Compact, and Interpretable Tree Annotation, COLING-ACL 2006 slides. • S. Petrov and D. Klein, NACL Improved Inference for Unlexicalized Parsing : 2007 slides. • S. Petrov, L. Barrett, R. Thibaux, and D. Klein. 2006. Learning accurate, compact, and interpretable tree annotation. In COLING-ACL ’06, pages 443–440. • S. Petrov and D. Klein. 2007. Improved Inference for Unlexicalized Parsing . In NACL ’06. • T. Matsuzaki, Y. Miyao, and J. Tsujii. 2005. Probabilistic CFG with latent annotations. In ACL ’05, pages 75–82.

More Related