310 likes | 440 Views
Improved Inference for Unlexicalized Parsing. Slav Petrov and Dan Klein. DT. DT 1. DT 2. DT 1. DT 2. DT 3. DT 4. DT 1. DT 2. DT 3. DT 4. DT 5. DT 6. DT 7. DT 8. [Petrov et al. ‘06]. Unlexicalized Parsing. Hierarchical, adaptive refinement:.
E N D
Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein
DT DT1 DT2 DT1 DT2 DT3 DT4 DT1 DT2 DT3 DT4 DT5 DT6 DT7 DT8 [Petrov et al. ‘06] Unlexicalized Parsing Hierarchical, adaptive refinement: 91.2 F1 score on Dev Set (1600 sentences)
Treebank Coarse grammar Prune Parse Parse NP … VP NP-apple NP-1 VP-6 VP-run NP-17 NP-dog … … VP-31 NP-eat NP-12 NP-cat … … Refined grammar Refined grammar [Goodman ‘97, Charniak&Johnson ‘05] Coarse-to-Fine Parsing
Prune? For each chart item X[i,j], compute posterior probability: < threshold E.g. consider the span 5 to 12: coarse: refined:
1621 min 111 min (no search error)
X A,B,.. NP … VP ? ??? ??? NP-apple VP-run NP-dog … NP-eat NP-cat … Refined grammar [Charniak et al. ‘06] Multilevel Coarse-to-Fine Parsing Add more rounds of pre-parsing Grammars coarser than X-bar
Hierarchical Pruning Consider again the span 5 to 12: coarse: split in two: split in four: split in eight:
G1 G2 G3 G4 G5 G6 DT DT1 DT2 DT1 DT2 DT3 DT4 Learning DT1 DT2 DT3 DT4 DT5 DT6 DT7 DT8 Intermediate Grammars X-Bar=G0 G=
1621 min 111 min 35 min (no search error)
the the that that this this some some That That this this these these some some That • This • … • … … That That … … … this this … these this … … these … that … that these … … … some some … some some … EM State Drift (DT tag)
0(G) 1(G) 2(G) 3(G) 4(G) 5(G) G1 G2 G3 G4 G5 G6 G1 G2 G3 G4 G5 G6 Learning Learning Projection i G Projected Grammars X-Bar=G0 G=
Projection NP Easy: VP S Estimating Projected Grammars Nonterminals? NP0 NP1 VP1 VP0 S0 S1 Nonterminals in (G) Nonterminals in G
? ??? Rules in G Rules in (G) Estimating Projected Grammars Rules? S NP VP S1 NP1 VP1 0.20 S1 NP1 VP2 0.12 S1 NP2 VP1 0.02 S1 NP2 VP2 0.03 S2 NP1 VP1 0.11 S2 NP1 VP2 0.05 S2 NP2 VP1 0.08 S2 NP2 VP2 0.12
… S NP VP S1 NP1 VP1 0.20 S1 NP1 VP2 0.12 S1 NP2 VP1 0.02 S1 NP2 VP2 0.03 S2 NP1 VP1 0.11 S2 NP1 VP2 0.05 S2 NP2 VP1 0.08 S2 NP2 VP2 0.12 Rules in G Rules in (G) … Treebank Infinite tree distribution [Corazza & Satta ‘06] Estimating Projected Grammars Estimating Grammars 0.56
Calculating Expectations • Nonterminals: • ck(X): expected counts up to depth k • Converges within 25 iterations (few seconds) • Rules:
1621 min 111 min 35 min 15 min (no search error)
G1 G2 G3 G4 G5 G6 60 % 12 % 7 % Learning 6 % 6 % 5 % 4 % Parsing times X-Bar=G0 G=
Bracket Posteriors (after G0)
Bracket Posteriors (Movie) (Final Chart)
-2 -1 Parses: Derivations: -2 -1 -1 -2 -1 -1 -2 -1 -1 -2 -1 -1 Parse Selection Computing most likely unsplit tree is NP-hard: • Settle for best derivation. • Rerank n-best list. • Use alternative objective function.
[Titov & Henderson ‘06] Parse Risk Minimization • Expected loss according to our beliefs: • TT : true tree • TP : predicted tree • L : loss function (0/1, precision, recall, F1) • Use n-best candidate list and approximate expectation with samples.
Dynamic Programming [Matsuzaki et al. ‘05] Approximate posterior parse distribution à la [Goodman ‘98] Maximize number of expected correct rules
Final Results (Efficiency) • Berkeley Parser: • 15 min • 91.2 F-score • Implemented in Java • Charniak & Johnson ‘05 Parser • 19 min • 90.7 F-score • Implemented in C
Conclusions • Hierarchical coarse-to-fine inference • Projections • Marginalization • Multi-lingual unlexicalized parsing
Thank You! Parser available at http://nlp.cs.berkeley.edu