1 / 39

A Survey of Unsupervised Grammar Induction

A Survey of Unsupervised Grammar Induction. Baskaran Sankaran Senior Supervisor: Dr Anoop Sarkar. School of Computing Science Simon Fraser University. Motivation. Languages have hidden regularities karuppu naay puunaiyai thurathiyathu iruttil karuppu uruvam marainthathu

nita
Download Presentation

A Survey of Unsupervised Grammar Induction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Survey of Unsupervised Grammar Induction Baskaran Sankaran Senior Supervisor: Dr AnoopSarkar School of Computing Science Simon Fraser University

  2. Motivation • Languages have hidden regularities karuppunaaypuunaiyaithurathiyathu iruttilkaruppuuruvammarainthathu naaythurathiyapuunaivekamaakaootiyathu

  3. Motivation • Languages have hidden regularities karuppunaaypuunaiyaithurathiyathu iruttilkaruppuuruvammarainthathu naaythurathiyapuunaivekamaakaootiyathu

  4. Formal Structures

  5. Phrase-Structure Sometimes the bribed became partners in the company

  6. Phrase-Structure S ADVP @S RB NP VP DT VBD VBN @VP Binarize, CNF PP NP • Sparsityissue with words • Use POS tags NP NNS IN NN IN DT

  7. Evaluation Metric-1 • Unsupervised Induction • Binarized output tree • Possibly unlabelled • Evaluation • Gold treebank parse • Recall - % of true constituents found • Also precision and F-score • Wall Street Journal (WSJ) dataset S X X X X VBD X RB X DT VBN X X IN NNS DT NN

  8. Dependency Structure VBD NNS VBD* VBN RB DT VBN* NNS* IN Sometimes became bribed the IN* partners NN in NN* DT company the

  9. Dependency Structure DT VBD NNS NN IN DT VBN RB Sometimes the bribed became partners in the company

  10. Evaluation Metric-2 • Unsupervised Induction • Generates directed dependency arcs • Compute (directed) attachment accuracy • Gold dependencies • WSJ10 dataset DT VBD NNS NN IN DT VBN RB Sometimes the bribed became partners in the company

  11. Unsupervised Grammar Induction To learn the hidden structure of a language • POS tag sequences as input • Generates phrase-structure/ dependencies • No attempt to find the meaning • Overview • Phrase-structure and dependency grammars • Mostly on English (few on Chinese, German etc.) • Learning restricted to shorter sentences • Significantly lags behind the supervised methods

  12. Phrase-Structure Induction

  13. Toy Example Corpus the dog bites a man dog sleeps a dog bites a bone the man sleeps Grammar S  NP VP NP  N N  man VP  V NP Det a N  bone VP  V Det the V  sleeps NP Det N N  dog V  bites

  14. EM for PCFG (Baker ’79; Lari and Young ’90) • Inside-Outside • EM instance for probabilistic CFG • Generalization of Forward-backward for HMMs • Non-terminals are fixed • Estimate maximum likelihood rule probabilities

  15. Inside-Outside P(S  Sometimes @S) P(@S  NP VP) @S  NP VP P(NP  the bribed) company P(VP  became … company)

  16. Constraining Search (Pereira and Schabes ’92; Schabes et al. ’93) company

  17. Constraining Search (Pereira and Schabes ’92; Schabes et al. ’93; Hwa ’99) • Treebank bracketings • Bracketing boundaries constrain induction • What happens with limited supervision? • More bracketed data exposed iteratively • 0% bracketed data • 100% bracketed data • Right-branching baseline Recall: 50.0 Recall: 78.0 Recall: 76.0

  18. Distributional clustering (Adriaans et al. ’00; Clark ’00; van Zaanen ’00) • Cluster the word sequences • Context: adjacent words or boundaries • Relative frequency distribution of contexts the black dog bites the man the man eats an apple • Identifies constituents • Evaluation on ATIS corpus Recall:35.6

  19. Constituent-Context Model (Klein and Manning ’02) • Valid constituents in a tree should not cross S S X X X X X X VBD X RB X X DT VBD VBN X RB X DT VBN X X X X IN NNS NNS X X DT NN IN NN DT

  20. Constituent-Context Model RB DT S VBN VBD X X RB X X DT VBD VBN X X X company NNS X X Recall Right-branch: 70.0 CCM: 81.6 IN NN DT

  21. Dependency Induction

  22. Dependency Model w/ Valence (Klein and Manning ’04) • Simple generative model • Choose head – P(Root) • End – P(End | h, dir, v) • Attachment dir (right, left) • Valence (head outward) • Argument – P(a | h, dir) Dir Accuracy CCM: 23.8 DMV: 43.2 Joint: 47.5 • Head – P(Root) • Argument – P(a | h, dir) • End – P(End | h, dir, v) DT VBD NNS NN IN DT VBN RB company Sometimes the bribed became partners in the company

  23. DMV Extensions (Headden et al. ’09; Blunsom and Cohn ’10) • Extended Valence (EVG) • Valence frames for the head • Allows different distributions over arguments • Lexicalization (L-EVG) • Tree Substitution Grammar • Tree fragments instead of CFG rules Dir Acc: 65.0 Dir Acc: 68.8 Dir Acc: 67.7 DT VBD NNS NN IN DT VBN RB Sometimes the bribed became partners in the company

  24. Multilingual setting

  25. Bilingual Alignment & Parsing (Wu ’97) • Inversion Transduction Grammar (ITG) • Allows reordering S e1 e2 e3 e4 f1 f2 f3 f4 X X e3 f1 e4 f2 e1 f3 e2 f4

  26. Bilingual Parsing (Snyder et al. ’09) • Bilingual Parsing • PP Attachment ambiguity I saw (the student (from MIT)1 )2 • Not ambiguous in Urdu میں(یمآئٹیسے)1 (طالبعلم)2کودیکھا I ((MIT of) student) saw

  27. Summary & Overview EM for PCFG EM for PCFG Parametric Search Methods Constrain with bracketing Constrain with bracketing Distributional Clustering Distributional Clustering Structural Search Methods Data-oriented Parsing Data-oriented Parsing Prototype Prototype CCM CCM • State-of-the-art • Phrase-structure (CCM + DMV) • Recall: 88.0 • Dependency (Lexicalized EVG) • Dir Acc: 68.8 DMV DMV Contrastive Estimation Contrastive Estimation EVG & L-EVG EVG & L-EVG TSG + DMV TSG + DMV

  28. Thanks! Questions?

  29. Motivation • Languages have hidden regularities

  30. Motivation • Languages have hidden regularities • The guy in China • … new leader in China • That’s what I am asking you … • I amtellingyou …

  31. Issues with EM (Carroll and Charniak ’92; Periera and Schabes ’92; de Marcken ’05) (Liang and Klein ’08; Spitkovsky et al. ’10) • Phrase-structure • Finds local maxima instead of global • Multiple ordered adjuctions • Both phrase-structure & dependency • Disconnect between likelihood and optimal grammar

  32. Constituent-Context Model (Klein and Manning ’02) • CCM • Only constituent identity • Valid constituents in a tree should not cross

  33. Bootstrap phrases (Haghighi and Klein ’06) • Bootstrap with seed examples for constituents types • Chosen from most frequent treebank phrases • Induces labels for constituents • Integrate with CCM • CCM generates brackets (constituents) • Proto labels them Recall: 59.6 Recall: 68.4

  34. Dependency Model w/ Valence (Klein and Manning ’04) • Simple generative model • Choose head; attachment dir (right, left) • Valence (head outward) • End of generation modelled separately Dir Acc: 43.2 DT VBD NNS NN IN DT VBN RB Sometimes the bribed became partners in the company

  35. Learn from how not to speak • Contrastive Estimation (Smith and Eisner ’05) • Log-linear Model of dependency • Features: f(q, T) • P(Root); P(a|h, dir); P(End | h, dir, v) • Conditional likelihood

  36. Learn from how not to speak (Smith and Eisner ’05) • Contrastive Estimation • Ex. the brown cat vs. cat brown the • Neighborhoods • Transpose (Trans), delete & transpose (DelOrTrans) Dir Acc: 48.8

  37. DMV Extensions-1 (Cohen and Smith ’08, ’09) • Tying parameters • Correlated Topic Model (CTM) • Correlation between different word types • Two types of tying parameters • Logistic Normal (LN) • Shared LN Dir Acc: 61.3 Dir Acc: 61.3

  38. DMV Extensions-2 (Blunsom and Cohn ’10) VBD VBD VBD VBN NNS VBD* VBN RB NNS DT VBN* NNS* IN Sometimes became became IN bribed the IN* partners NN in NN* DT IN company the in VBD NN VBD NNS NNS

  39. DMV Extensions-2 VBD (Blunsom and Cohn ’10) • Tree Substitution Grammar (TSG) • Lexicalized trees • Hierarchical prior • Different levels of backoff VBD VBN NNS Dir Acc: 67.7 became IN IN in VBD NN VBD NNS NNS

More Related