310 likes | 448 Views
Prototype-Driven Grammar Induction. Aria Haghighi and Dan Klein Computer Science Division University of California Berkeley. Grammar Induction. DT NN VBD DT NN IN NN The screen was a sea of red. First Attempt…. DT NN VBD DT NN IN NN
E N D
Prototype-Driven Grammar Induction Aria Haghighi and Dan Klein Computer Science Division University of California Berkeley
Grammar Induction DTNNVBD DT NN IN NN The screen was a sea of red
First Attempt….. DTNNVBD DT NN IN NN The screen was a sea of red
Central Questions • How do we specify what we want to learn? • How do we fix observed errors? What’s an NP? That’s not quite it!
Experimental Set-up • Binary Grammar { X1, X2, … Xn} plus POS tags • Data • WSJ-10 [7k sentences] • Evaluate on Labeled F1 • Grammar Upper Bound: 86.1 Xi Xj Xk
Experiment Roadmap • Unconstrained Induction • Need bracket constraint • Gold Bracket Induction • Prototypes and Similarity • CCM Bracket Induction
Unconstrained PCFG Induction (Outside) 0 i j n (Inside) • Learn PCFG with EM • Inside-Outside Algorithm • Lari & Young [90] • Results
Constrained PCFG Induction • Gold Brackets • Pereira & Schabes [93] • Result
Encoding Knowledge What’s an NP? Semi-Supervised Learning
Encoding Knowledge What’s an NP? For instance, DT NN JJ NNS NNP NNP Prototype Learning
Grammar Induction Experiments • Add Prototypes • Manually constructed
How to use prototypes? S ? VP ? PP ? NP ? NP ? NN koala VBD sat IN in DT the NN tree DT The ¦ ¦
How to use prototypes? S VP NP ? PP NP JJ hungry NN koala VBD sat IN in DT the NN tree DT The ¦ ¦
Distributional Similarity proto=NP (VBD DT NN) (IN NN) (DT NN) (VBN IN NN) (TO CD CD) VP PP (NNP NNP) (MD VB NN) (IN PRP) NP (JJ NNS) { ¦ __ VBD : 0.3, VBD __ ¦ : 0.2, IN __ VBD: 0.1, ….. } (DT JJ NN)
Distributional Similarity (VBD DT NN) (IN NN) (DT NN) (VBN IN NN) (TO CD CD) VP PP (NNP NNP) (MD VB NN) (IN PRP) NP proto=NONE (JJ NNS) (IN DT)
Prototype CFG+ Model CFG Rule S P (DT NP | NP) P (proto=NP | NP) VP NP Proto Feature PP NP NP JJ hungry NN koala DT the NN tree DT The VBD sat IN in ¦ ¦
Prototype CFG+ Induction • Experimental Set-Up • BLIPP corpus • Gold Brackets • Results
Summary So Far • Bracket constraint and prototypes give good performance!
Constituent-Context Model Yield + P(VBD IN DT NN | +) P(NN__ ¦ | +) Context JJ hungry NN koala DT the NN tree DT The VBD sat IN in ¦ ¦ P(NN VBD| -) P(JJ __IN| -) - Klein & Manning ‘02
Product Model • Different Aspects of Syntax • Intersected EM [Klein 2005, Liang et. al. ‘06] • Encourages mass on trees compatible with CCM and CFG DT NN + yield ¦ _VBD + context NP ! NP PP CFG CCM
Grammar Induction Experiments • Intersected CFG and CCM • No prototypes • Results
Grammar Induction Experiments • Intersected CFG+ and CCM • Add Prototypes • Results
Reacting to Errors Our Tree Correct Tree • Possessive NPs
Reacting to Errors New Analysis • Add Prototype: NP-POS NN POS
Error Analysis Our Tree Correct Tree • Modal VPs
Reacting to Errors New Analysis • Add Prototype: VP-INF VB NN
Fixing Errors • Supplement Prototypes • NP-POS and VP-INF • Results
Conclusion • Prototype-Driven Learning Flexible Weakly Supervised Framework • Merged distributional clustering techniques with structured models
Thank You! http://www.cs.berkeley.edu/~aria42
Unconstrained PCFG Induction (Outside) 0 i j n (Inside) Xi Xi Xi Xi • Binary Grammar { X1, X2, … Xn} • Learn PCFG with EM • Inside-Outside Algorithm • Lari & Young [93] Xj Xk N Xk Xj V N V