A Survey of Unsupervised Grammar Induction

A Survey of Unsupervised Grammar Induction Baskaran Sankaran Senior Supervisor: Dr AnoopSarkar School of Computing Science Simon Fraser University

Motivation • Languages have hidden regularities karuppunaaypuunaiyaithurathiyathu iruttilkaruppuuruvammarainthathu naaythurathiyapuunaivekamaakaootiyathu

Formal Structures

Phrase-Structure Sometimes the bribed became partners in the company

Phrase-Structure S ADVP @S RB NP VP DT VBD VBN @VP Binarize, CNF PP NP • Sparsityissue with words • Use POS tags NP NNS IN NN IN DT

Evaluation Metric-1 • Unsupervised Induction • Binarized output tree • Possibly unlabelled • Evaluation • Gold treebank parse • Recall - % of true constituents found • Also precision and F-score • Wall Street Journal (WSJ) dataset S X X X X VBD X RB X DT VBN X X IN NNS DT NN

Dependency Structure VBD NNS VBD* VBN RB DT VBN* NNS* IN Sometimes became bribed the IN* partners NN in NN* DT company the

Dependency Structure DT VBD NNS NN IN DT VBN RB Sometimes the bribed became partners in the company

Evaluation Metric-2 • Unsupervised Induction • Generates directed dependency arcs • Compute (directed) attachment accuracy • Gold dependencies • WSJ10 dataset DT VBD NNS NN IN DT VBN RB Sometimes the bribed became partners in the company

Unsupervised Grammar Induction To learn the hidden structure of a language • POS tag sequences as input • Generates phrase-structure/ dependencies • No attempt to find the meaning • Overview • Phrase-structure and dependency grammars • Mostly on English (few on Chinese, German etc.) • Learning restricted to shorter sentences • Significantly lags behind the supervised methods

Phrase-Structure Induction

Toy Example Corpus the dog bites a man dog sleeps a dog bites a bone the man sleeps Grammar S  NP VP NP  N N  man VP  V NP Det a N  bone VP  V Det the V  sleeps NP Det N N  dog V  bites

EM for PCFG (Baker ’79; Lari and Young ’90) • Inside-Outside • EM instance for probabilistic CFG • Generalization of Forward-backward for HMMs • Non-terminals are fixed • Estimate maximum likelihood rule probabilities

Inside-Outside P(S  Sometimes @S) P(@S  NP VP) @S  NP VP P(NP  the bribed) company P(VP  became … company)

Constraining Search (Pereira and Schabes ’92; Schabes et al. ’93) company

Constraining Search (Pereira and Schabes ’92; Schabes et al. ’93; Hwa ’99) • Treebank bracketings • Bracketing boundaries constrain induction • What happens with limited supervision? • More bracketed data exposed iteratively • 0% bracketed data • 100% bracketed data • Right-branching baseline Recall: 50.0 Recall: 78.0 Recall: 76.0

Distributional clustering (Adriaans et al. ’00; Clark ’00; van Zaanen ’00) • Cluster the word sequences • Context: adjacent words or boundaries • Relative frequency distribution of contexts the black dog bites the man the man eats an apple • Identifies constituents • Evaluation on ATIS corpus Recall:35.6

Constituent-Context Model (Klein and Manning ’02) • Valid constituents in a tree should not cross S S X X X X X X VBD X RB X X DT VBD VBN X RB X DT VBN X X X X IN NNS NNS X X DT NN IN NN DT

Constituent-Context Model RB DT S VBN VBD X X RB X X DT VBD VBN X X X company NNS X X Recall Right-branch: 70.0 CCM: 81.6 IN NN DT

Dependency Induction

Dependency Model w/ Valence (Klein and Manning ’04) • Simple generative model • Choose head – P(Root) • End – P(End | h, dir, v) • Attachment dir (right, left) • Valence (head outward) • Argument – P(a | h, dir) Dir Accuracy CCM: 23.8 DMV: 43.2 Joint: 47.5 • Head – P(Root) • Argument – P(a | h, dir) • End – P(End | h, dir, v) DT VBD NNS NN IN DT VBN RB company Sometimes the bribed became partners in the company

DMV Extensions (Headden et al. ’09; Blunsom and Cohn ’10) • Extended Valence (EVG) • Valence frames for the head • Allows different distributions over arguments • Lexicalization (L-EVG) • Tree Substitution Grammar • Tree fragments instead of CFG rules Dir Acc: 65.0 Dir Acc: 68.8 Dir Acc: 67.7 DT VBD NNS NN IN DT VBN RB Sometimes the bribed became partners in the company

Multilingual setting

Bilingual Alignment & Parsing (Wu ’97) • Inversion Transduction Grammar (ITG) • Allows reordering S e1 e2 e3 e4 f1 f2 f3 f4 X X e3 f1 e4 f2 e1 f3 e2 f4

Bilingual Parsing (Snyder et al. ’09) • Bilingual Parsing • PP Attachment ambiguity I saw (the student (from MIT)1 )2 • Not ambiguous in Urdu میں(یمآئٹیسے)1 (طالبعلم)2کودیکھا I ((MIT of) student) saw

Summary & Overview EM for PCFG EM for PCFG Parametric Search Methods Constrain with bracketing Constrain with bracketing Distributional Clustering Distributional Clustering Structural Search Methods Data-oriented Parsing Data-oriented Parsing Prototype Prototype CCM CCM • State-of-the-art • Phrase-structure (CCM + DMV) • Recall: 88.0 • Dependency (Lexicalized EVG) • Dir Acc: 68.8 DMV DMV Contrastive Estimation Contrastive Estimation EVG & L-EVG EVG & L-EVG TSG + DMV TSG + DMV

Thanks! Questions?

Motivation • Languages have hidden regularities

Motivation • Languages have hidden regularities • The guy in China • … new leader in China • That’s what I am asking you … • I amtellingyou …

Issues with EM (Carroll and Charniak ’92; Periera and Schabes ’92; de Marcken ’05) (Liang and Klein ’08; Spitkovsky et al. ’10) • Phrase-structure • Finds local maxima instead of global • Multiple ordered adjuctions • Both phrase-structure & dependency • Disconnect between likelihood and optimal grammar

Constituent-Context Model (Klein and Manning ’02) • CCM • Only constituent identity • Valid constituents in a tree should not cross

Bootstrap phrases (Haghighi and Klein ’06) • Bootstrap with seed examples for constituents types • Chosen from most frequent treebank phrases • Induces labels for constituents • Integrate with CCM • CCM generates brackets (constituents) • Proto labels them Recall: 59.6 Recall: 68.4

Dependency Model w/ Valence (Klein and Manning ’04) • Simple generative model • Choose head; attachment dir (right, left) • Valence (head outward) • End of generation modelled separately Dir Acc: 43.2 DT VBD NNS NN IN DT VBN RB Sometimes the bribed became partners in the company

Learn from how not to speak • Contrastive Estimation (Smith and Eisner ’05) • Log-linear Model of dependency • Features: f(q, T) • P(Root); P(a|h, dir); P(End | h, dir, v) • Conditional likelihood

Learn from how not to speak (Smith and Eisner ’05) • Contrastive Estimation • Ex. the brown cat vs. cat brown the • Neighborhoods • Transpose (Trans), delete & transpose (DelOrTrans) Dir Acc: 48.8

DMV Extensions-1 (Cohen and Smith ’08, ’09) • Tying parameters • Correlated Topic Model (CTM) • Correlation between different word types • Two types of tying parameters • Logistic Normal (LN) • Shared LN Dir Acc: 61.3 Dir Acc: 61.3

DMV Extensions-2 (Blunsom and Cohn ’10) VBD VBD VBD VBN NNS VBD* VBN RB NNS DT VBN* NNS* IN Sometimes became became IN bribed the IN* partners NN in NN* DT IN company the in VBD NN VBD NNS NNS

DMV Extensions-2 VBD (Blunsom and Cohn ’10) • Tree Substitution Grammar (TSG) • Lexicalized trees • Hierarchical prior • Different levels of backoff VBD VBN NNS Dir Acc: 67.7 became IN IN in VBD NN VBD NNS NNS

A Survey of Unsupervised Grammar Induction

A Survey of Unsupervised Grammar Induction

Presentation Transcript

Towards unsupervised induction of morphophonological rules

Unsupervised Discovery of Morphemes

Unsupervised Syntactic Category Induction using Multi-level Linguistic Features

Unsupervised learning

Prototype-Driven Grammar Induction

Unsupervised Learning of Probabilistic Context-Free Grammar Using Iterative Biclustering

Grammatical inference Vs Grammar induction

The Impact of Grammar Enhancement on Semantic Resources Induction

Grammar Induction

Prototype-Driven Grammar Induction

Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models

Grid Connection of Doubly Fed Induction Generator Wind Turbines: a survey

Unsupervised learning

Grammar Induction

Grammar induction for assistive domestic vocal interfaces

Unsupervised Learning

Unsupervised Learning

Unsupervised Learning of Probabilistic Context-Free Grammar Using Iterative Biclustering

Grammatical inference Vs Grammar induction

Induction of Node Label Controlled Graph Grammar Rules