390 likes | 476 Views
A Survey of Unsupervised Grammar Induction. Baskaran Sankaran Senior Supervisor: Dr Anoop Sarkar. School of Computing Science Simon Fraser University. Motivation. Languages have hidden regularities karuppu naay puunaiyai thurathiyathu iruttil karuppu uruvam marainthathu
E N D
A Survey of Unsupervised Grammar Induction Baskaran Sankaran Senior Supervisor: Dr AnoopSarkar School of Computing Science Simon Fraser University
Motivation • Languages have hidden regularities karuppunaaypuunaiyaithurathiyathu iruttilkaruppuuruvammarainthathu naaythurathiyapuunaivekamaakaootiyathu
Motivation • Languages have hidden regularities karuppunaaypuunaiyaithurathiyathu iruttilkaruppuuruvammarainthathu naaythurathiyapuunaivekamaakaootiyathu
Phrase-Structure Sometimes the bribed became partners in the company
Phrase-Structure S ADVP @S RB NP VP DT VBD VBN @VP Binarize, CNF PP NP • Sparsityissue with words • Use POS tags NP NNS IN NN IN DT
Evaluation Metric-1 • Unsupervised Induction • Binarized output tree • Possibly unlabelled • Evaluation • Gold treebank parse • Recall - % of true constituents found • Also precision and F-score • Wall Street Journal (WSJ) dataset S X X X X VBD X RB X DT VBN X X IN NNS DT NN
Dependency Structure VBD NNS VBD* VBN RB DT VBN* NNS* IN Sometimes became bribed the IN* partners NN in NN* DT company the
Dependency Structure DT VBD NNS NN IN DT VBN RB Sometimes the bribed became partners in the company
Evaluation Metric-2 • Unsupervised Induction • Generates directed dependency arcs • Compute (directed) attachment accuracy • Gold dependencies • WSJ10 dataset DT VBD NNS NN IN DT VBN RB Sometimes the bribed became partners in the company
Unsupervised Grammar Induction To learn the hidden structure of a language • POS tag sequences as input • Generates phrase-structure/ dependencies • No attempt to find the meaning • Overview • Phrase-structure and dependency grammars • Mostly on English (few on Chinese, German etc.) • Learning restricted to shorter sentences • Significantly lags behind the supervised methods
Toy Example Corpus the dog bites a man dog sleeps a dog bites a bone the man sleeps Grammar S NP VP NP N N man VP V NP Det a N bone VP V Det the V sleeps NP Det N N dog V bites
EM for PCFG (Baker ’79; Lari and Young ’90) • Inside-Outside • EM instance for probabilistic CFG • Generalization of Forward-backward for HMMs • Non-terminals are fixed • Estimate maximum likelihood rule probabilities
Inside-Outside P(S Sometimes @S) P(@S NP VP) @S NP VP P(NP the bribed) company P(VP became … company)
Constraining Search (Pereira and Schabes ’92; Schabes et al. ’93) company
Constraining Search (Pereira and Schabes ’92; Schabes et al. ’93; Hwa ’99) • Treebank bracketings • Bracketing boundaries constrain induction • What happens with limited supervision? • More bracketed data exposed iteratively • 0% bracketed data • 100% bracketed data • Right-branching baseline Recall: 50.0 Recall: 78.0 Recall: 76.0
Distributional clustering (Adriaans et al. ’00; Clark ’00; van Zaanen ’00) • Cluster the word sequences • Context: adjacent words or boundaries • Relative frequency distribution of contexts the black dog bites the man the man eats an apple • Identifies constituents • Evaluation on ATIS corpus Recall:35.6
Constituent-Context Model (Klein and Manning ’02) • Valid constituents in a tree should not cross S S X X X X X X VBD X RB X X DT VBD VBN X RB X DT VBN X X X X IN NNS NNS X X DT NN IN NN DT
Constituent-Context Model RB DT S VBN VBD X X RB X X DT VBD VBN X X X company NNS X X Recall Right-branch: 70.0 CCM: 81.6 IN NN DT
Dependency Model w/ Valence (Klein and Manning ’04) • Simple generative model • Choose head – P(Root) • End – P(End | h, dir, v) • Attachment dir (right, left) • Valence (head outward) • Argument – P(a | h, dir) Dir Accuracy CCM: 23.8 DMV: 43.2 Joint: 47.5 • Head – P(Root) • Argument – P(a | h, dir) • End – P(End | h, dir, v) DT VBD NNS NN IN DT VBN RB company Sometimes the bribed became partners in the company
DMV Extensions (Headden et al. ’09; Blunsom and Cohn ’10) • Extended Valence (EVG) • Valence frames for the head • Allows different distributions over arguments • Lexicalization (L-EVG) • Tree Substitution Grammar • Tree fragments instead of CFG rules Dir Acc: 65.0 Dir Acc: 68.8 Dir Acc: 67.7 DT VBD NNS NN IN DT VBN RB Sometimes the bribed became partners in the company
Bilingual Alignment & Parsing (Wu ’97) • Inversion Transduction Grammar (ITG) • Allows reordering S e1 e2 e3 e4 f1 f2 f3 f4 X X e3 f1 e4 f2 e1 f3 e2 f4
Bilingual Parsing (Snyder et al. ’09) • Bilingual Parsing • PP Attachment ambiguity I saw (the student (from MIT)1 )2 • Not ambiguous in Urdu میں(یمآئٹیسے)1 (طالبعلم)2کودیکھا I ((MIT of) student) saw
Summary & Overview EM for PCFG EM for PCFG Parametric Search Methods Constrain with bracketing Constrain with bracketing Distributional Clustering Distributional Clustering Structural Search Methods Data-oriented Parsing Data-oriented Parsing Prototype Prototype CCM CCM • State-of-the-art • Phrase-structure (CCM + DMV) • Recall: 88.0 • Dependency (Lexicalized EVG) • Dir Acc: 68.8 DMV DMV Contrastive Estimation Contrastive Estimation EVG & L-EVG EVG & L-EVG TSG + DMV TSG + DMV
Thanks! Questions?
Motivation • Languages have hidden regularities
Motivation • Languages have hidden regularities • The guy in China • … new leader in China • That’s what I am asking you … • I amtellingyou …
Issues with EM (Carroll and Charniak ’92; Periera and Schabes ’92; de Marcken ’05) (Liang and Klein ’08; Spitkovsky et al. ’10) • Phrase-structure • Finds local maxima instead of global • Multiple ordered adjuctions • Both phrase-structure & dependency • Disconnect between likelihood and optimal grammar
Constituent-Context Model (Klein and Manning ’02) • CCM • Only constituent identity • Valid constituents in a tree should not cross
Bootstrap phrases (Haghighi and Klein ’06) • Bootstrap with seed examples for constituents types • Chosen from most frequent treebank phrases • Induces labels for constituents • Integrate with CCM • CCM generates brackets (constituents) • Proto labels them Recall: 59.6 Recall: 68.4
Dependency Model w/ Valence (Klein and Manning ’04) • Simple generative model • Choose head; attachment dir (right, left) • Valence (head outward) • End of generation modelled separately Dir Acc: 43.2 DT VBD NNS NN IN DT VBN RB Sometimes the bribed became partners in the company
Learn from how not to speak • Contrastive Estimation (Smith and Eisner ’05) • Log-linear Model of dependency • Features: f(q, T) • P(Root); P(a|h, dir); P(End | h, dir, v) • Conditional likelihood
Learn from how not to speak (Smith and Eisner ’05) • Contrastive Estimation • Ex. the brown cat vs. cat brown the • Neighborhoods • Transpose (Trans), delete & transpose (DelOrTrans) Dir Acc: 48.8
DMV Extensions-1 (Cohen and Smith ’08, ’09) • Tying parameters • Correlated Topic Model (CTM) • Correlation between different word types • Two types of tying parameters • Logistic Normal (LN) • Shared LN Dir Acc: 61.3 Dir Acc: 61.3
DMV Extensions-2 (Blunsom and Cohn ’10) VBD VBD VBD VBN NNS VBD* VBN RB NNS DT VBN* NNS* IN Sometimes became became IN bribed the IN* partners NN in NN* DT IN company the in VBD NN VBD NNS NNS
DMV Extensions-2 VBD (Blunsom and Cohn ’10) • Tree Substitution Grammar (TSG) • Lexicalized trees • Hierarchical prior • Different levels of backoff VBD VBN NNS Dir Acc: 67.7 became IN IN in VBD NN VBD NNS NNS