Learning Accurate, Compact, and Interpretable Tree Annotation

Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš Ercegovčević

Outline • Introduction • EM algorithm • Latent Grammars • Motivation • Learning Latent PCFG • Split-Merge Adaptation • Efficient inference with Latent Grammars • Pruning in Multilevel Coarse-to-Fine parsing • Parse Selection

Introduction : EM Algorithm • Iterative algorithm for finding MLE or MAP estimates of parameters in statistical models • X – observed data; Z – set of latent variables • Θ – a vector of unknown parametes • Likelihood function: • MLE of the marginal likelihood : • However this quantity is intractable • Often we don’t know both Z and Θ

Introduction : EM Algorithm • Find the MLE of the marginal likelihood by iteratively applying two steps: • Expectation step (E-step): • Calculate Z under current Θ • Maximization step (M-step): • Find Θ that maximizes the quantity

Latent PCFG • Standard coarse Treebank Tree • Baseline for parsing F1 72.6

Latent PCFG • Parent annotated trees [Johnson ’98], [Klein & Manning ’03] • F1 86.3

Latent PCFG • Head lexicalized [Collins ’99, Charniak ’00]trees • F1 88.6

Latent PCFG • Automatically clustered categories with F1 86.7 [Matsuzaki et al. ’05] • Same number of subcategories for all categories

Latent PCFG • At each step split the categories into two sets. • After 6 iterations number of subcategories is 64 • Initialize EM with the results of the smaller grammar

Forward X1 X7 X2 X4 X3 X5 X6 . He was right Backward Learning Latent PCFG • Induce subcategories • Like forward-backward for HMMs • Fixed brackets • S

Learning Latent Grammar • Inside-Outside probabilities

Learning Latent Grammar • Expectation step (E-step): • Maximization step (M-step):

Latent Grammar : Adaptive splitting • Want to split more according to the data • Solution: Split everything then merge by the loss • Without loss in Accuracy

Latent Grammar : Adaptive splitting • The likelihood of data for tree T and sentence w: • Then for two annotations the overall loss can be estimated as:

Number of Phrasal Subcategories

Number of Phrasal Subcategories NP VP PP

Number of Phrasal Subcategories NAC X

Number of Lexical Subcategories POS TO ,

Number of Lexical Subcategories RB VBx IN DT

Number of Lexical Subcategories NNP JJ NNS NN

Latent Grammar : Results

Efficient inference with Latent Grammars • Latent Grammar with 91.2 F1 score on Dev Set (1600 sentences) WSJ • Training time 1621: more than a minute per sentence • For usage in real-word applications this is to slow • Improve on inference: • Hierarchical Pruning • Parse Selection

G1 G2 G3 G4 G5 G6 DT DT1 DT2 DT1 DT2 DT3 DT4 Learning DT1 DT2 DT3 DT4 DT5 DT6 DT7 DT8 Intermediate Grammars X-Bar=G0 G=

0(G) 1(G) 2(G) 3(G) 4(G) 5(G) G1 G2 G3 G4 G5 G6 G1 G2 G3 G4 G5 G6 Learning Learning Projection i G Projected Grammars X-Bar=G0 G=

… S  NP VP S1  NP1 VP1 0.20 S1  NP1 VP2 0.12 S1  NP2 VP1 0.02 S1  NP2 VP2 0.03 S2  NP1 VP1 0.11 S2  NP1 VP2 0.05 S2  NP2 VP1 0.08 S2  NP2 VP2 0.12 Rules in G Rules in (G) … Treebank Infinite tree distribution Estimating Grammars 0.56

Hierarchical Pruning Consider the span: coarse: split in two: split in four: split in eight:

Parse Selection • Given a sentence w and a split PCFG grammar G select the best parse that minimize our beliefs: • Intractable: we cannot generate all the TT

Parse Selection • Possible solutions • best derivation • generate n-best parses and re-rank them • samplingderivations of the grammar • select the minimum risk candidate based on loss function of posterior marginals:

Results

Thank You!

References • S. Petrov, L. Barrett, R. Thibaux, D Klein. Learning Accurate, Compact, and Interpretable Tree Annotation, COLING-ACL 2006 slides. • S. Petrov and D. Klein, NACL Improved Inference for Unlexicalized Parsing : 2007 slides. • S. Petrov, L. Barrett, R. Thibaux, and D. Klein. 2006. Learning accurate, compact, and interpretable tree annotation. In COLING-ACL ’06, pages 443–440. • S. Petrov and D. Klein. 2007. Improved Inference for Unlexicalized Parsing . In NACL ’06. • T. Matsuzaki, Y. Miyao, and J. Tsujii. 2005. Probabilistic CFG with latent annotations. In ACL ’05, pages 75–82.

Learning Accurate, Compact, and Interpretable Tree Annotation

Learning Accurate, Compact, and Interpretable Tree Annotation

Presentation Transcript

Decision Tree Learning

Decision Tree Learning

Decision Tree Learning

Tree Annotation Summary

Decision Tree Learning

Regression Tree Learning

Decision Tree Learning

Learning Tree Structures

Decision Tree Learning

Decision Tree Learning

Ontology Annotation Tree browser tool (OAT)

OAT - The Ontology Annotation Tree browser tool

Learning Accurate, Compact, and Interpretable Tree Annotation

Decision Tree Learning

Decision Tree Learning

Decision tree learning

PROJECT LEARNING TREE

Decision Tree Learning

Learning Tree Structures

Utilize CT Annotation Service For Accurate Predictions