1.16k likes | 1.35k Views
Multilingual Guidance for Unsupervised Linguistic Structure Prediction. Dipanjan Das Carnegie Mellon University CLSP Seminar Johns Hopkins University September 27, 2011. Joint work with. Shay Cohen Columbia University. Slav Petrov Google. Noah Smith Carnegie Mellon University. Goal: .
E N D
Multilingual Guidance for Unsupervised Linguistic Structure Prediction Dipanjan Das Carnegie Mellon University CLSP Seminar Johns Hopkins University September 27, 2011
Joint work with Shay Cohen Columbia University Slav Petrov Google Noah Smith Carnegie Mellon University
Goal: Learn linguistic structure without any labeled data in a target language . ADJ NOUN NOUN NOUN DET VERB Dependency Parsing Part-of-Speech Tagging Baltimore has a thriving music scene .
Multilingual Unsupervised Learning no parallel data (hard) using parallel data supervision in source language(s) supervision in source language(s) joint learning for multiple languages joint learning for multiple languages Yarowsky and Ngai (2001) Cohen and Smith (2009) Snyder et al. (2009) Xi and Hwa (2005) Berg-Kirkpatrick and Klein (2010) Naseem et al. (2010) Smith and Eisner (2009) McDonald et al. (2011)
Multilingual Unsupervised Learning no parallel data (hard) using parallel data supervision in source language(s) supervision in source language(s) joint learning for multiple languages joint learning for multiple languages This talk
Multilingual Unsupervised Learning using parallel data Part 1 supervision in source language(s)
Part-of-Speech Tagging . ADJ NOUN NOUN NOUN DET VERB Baltimore has a thriving music scene .
Supervised POS Tagging Supervised setting: average accuracy is 96.2% with TnT (Brants, 2000)
Resource-Poor Languages Several major languages with no or little annotated data Native speakers e.g. 109 million Punjabi 69 million Vietnamese However, lots of parallel and unannotated data! Basic NLP tools like POS tagging essential for development of language technologies Oriya 32 million Indonesian-Malay 37 million 20 million Azerbaijani 7.7 million Haitian See http://www.ethnologue.org/ethno_docs/distribution.asp?by=size
(Nearly) Universal POS Tags Example Penn Treebank tag maps: PRPPRON PRP$ PRON WP PRON WP$ PRON NN NOUN NNP NOUN NNPS NOUN NNS NOUN Example Spanish Treebank tag maps: p0 PRON pd PRON pe PRON pi PRON pn PRON pp PRON pr PRON pt PRON px PRON np NOUN nc NOUN See Petrov, Das and McDonald (2011)
(Nearly) Universal POS Tags Baltimore has a thriving music scene . . ADJ NOUN NOUN NOUN DET VERB Baltimore hat eine prächtig gedeihende Musikszene . . NOUN NOUN DET VERB ADJ ADJ বাল্টিমোরশহর এরসঙ্গীত পরিবেশ বেশ উন্নত | NOUN ADJ NOUN ADP ADJ . NOUN NOUN
Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm ? ? ? ? ? ? ? eine prächtig . hat Musikszene Baltimore gedeihende : observation sequence : state sequence Merialdo (1994)
Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm one of the 12 coarse tags ? ? ? ? ? ? ? eine prächtig . hat Musikszene Baltimore gedeihende : observation sequence : state sequence Merialdo (1994)
Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm transition multinomials ? ? hat Baltimore : observation sequence : state sequence Merialdo (1994)
Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm emission multinomials ? ? hat Baltimore : observation sequence : state sequence Merialdo (1994)
Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm ? ? ? ? ? ? ? eine prächtig . hat Musikszene Baltimore gedeihende EM-HMM Poor average result Johnson (2007)
Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) with locally normalized log-linear models emission multinomials ? ? hat Baltimore : observation sequence : state sequence Berg-Kirkpatrick et al. (2010)
Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) with locally normalized log-linear models Estimated using gradient-based methods emission multinomials ? ? hat Baltimore suffixhyphencapital letters numbers... : observation sequence : state sequence Berg-Kirkpatrick et al. (2010)
Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) with locally normalized log-linear models Estimated using gradient-based methods emission multinomials ? ? hat Baltimore EM-HMM Feature-HMM Improvements across all languages
Unsupervised POS Tagging with Dictionaries Hidden Markov Model (HMM) with locally normalized log-linear models PRON DET ADJ NUM ADJ ADV ADJ NOUN NOUN . VERB eine prächtig . hat Musikszene Baltimore gedeihende State space constrained by possible gold tags
Unsupervised POS Tagging with Dictionaries Hidden Markov Model (HMM) with locally normalized log-linear models PRON DET ADJ NUM ADJ ADV ADJ NOUN NOUN . VERB eine prächtig . hat Musikszene Baltimore gedeihende EM-HMM Feature-HMM w/ gold dictionary
Morphologically rich languages only have base forms in dictionaries For most languages, access to high-quality tag dictionaries is not realistic. • Ideas: • Use supervision in resource-rich languages • Use parallel data • Construct projected tag lexicons
Bilingual Projection automatic labels from supervised tagger, 97% accuracy . ADJ NOUN NOUN NOUN DET VERB Baltimore has a thriving music scene .
Bilingual Projection . ADJ NOUN NOUN NOUN DET VERB Baltimore has a thriving music scene . Baltimore hat eine prächtig gedeihende Musikszene . Automatic unsupervised alignments from translation data (available for more than 50 languages)
Bilingual Projection . ADJ NOUN NOUN NOUN DET VERB Baltimore has a thriving music scene . Baltimore hat eine prächtig gedeihende Musikszene . NOUN (most frequent tag) unaligned word Baseline1:direct projection Yarowsky and Ngai (2001)
Bilingual Projection Baltimore hat eine prächtig gedeihende Musikszene . + more projected tagged sentences . NOUN NOUN DET VERB NOUN ADJ supervised training tagger (Brants, 2000) Baseline1:direct projection Yarowsky and Ngai (2001)
Bilingual Projection Baseline 1:direct projection EM-HMM Feature-HMM Direct projection Yarowsky and Ngai (2001)
Bilingual Projection Baseline 1:direct projection EM-HMM Feature-HMM Direct projection consistent improvements over unsupervised models Yarowsky and Ngai (2001)
Bilingual Projection Baseline 2:lexicon projection
Bilingual Projection Baseline 2:lexicon projection . ADJ NOUN NOUN DET NOUN VERB Baltimore has a thriving . music scene prächtig gedeihende . Musikszene Baltimore hat eine
Bilingual Projection Baseline 2:lexicon projection . ADJ NOUN NOUN DET NOUN VERB . Baltimore gedeihende Baltimore has a thriving . music scene prächtig hat ignore unaligned word eine Musikszene
Bilingual Projection Baseline 2:lexicon projection . ADJ NOUN NOUN DET NOUN VERB . Baltimore gedeihende Baltimore has a thriving . music scene Bag of alignments hat eine Musikszene
Bilingual Projection Baseline 2:lexicon projection . ADJ NOUN NOUN DET NOUN VERB . Baltimore gedeihende Baltimore has a thriving . music scene hat eine Musikszene
Bilingual Projection Baseline 2:lexicon projection . ADJ NOUN NOUN DET PRON NUM NOUN VERB . one one Baltimore gedeihende Baltimore has a thriving . music scene hat eine Musikszene
Bilingual Projection Baseline 2:lexicon projection . ADJ VERB NOUN NOUN DET PRON NUM NOUN VERB . one one Baltimore gedeihende Baltimore has a thriving . thriving music scene hat eine Musikszene
Bilingual Projection Baseline 2:lexicon projection After scanning all the parallel data: . eine gedeihende Musikszene Baltimore hat = probability of a tag given a word
Bilingual Projection Baseline 2:lexicon projection Feature HMM constrained with projected dictionary EM-HMM Feature-HMM Direct projection Projected Dictionary Improvements over simple projection for majority of the languages
No information about unaligned words Baltimore has a thriving music scene . . ADJ NOUN NOUN NOUN DET VERB Baltimore hat eine prächtig gedeihende Musikszene . Can coverage be improved? Idea: Projected lexicon expansion and refinement using label propagation
How can label propagation help? Our Model:Graph-Based Projections • For a language: • Build graph over 2M trigram types as vertices • compute similarity matrix using co-occurrence statistics • Label distribution at each vertex tag distribution over the trigram’s middle word Subramanya, Petrov and Pereira (2010)
Example Graph in German gutem Essen zugetan ist wichtig bei zum Essen niederlassen ist gut bei fuers Essen drauf ist fein bei schlechtes Essen und 1000 Essen pro ist lebhafter bei zu realisieren , zu erreichen , zu stecken , zu essen ,
Example Graph in German gutem Essen zugetan NOUN ist wichtig bei zum Essen niederlassen ist gut bei fuers Essen drauf ist fein bei schlechtes Essen und 1000 Essen pro ist lebhafter bei zu realisieren , zu erreichen , zu stecken , zu essen , VERB
How can label propagation help? Our Model:Graph-Based Projections • For a target language: • Build graph over 2M trigram types as vertices • compute similarity matrix using co-occurrence statistics • Label distribution at each vertex tag distribution over the trigram’s middle word • Plug in auto-tagged words from a source language • Links between source and target language units are word alignments
Bilingual Graph ADJ ADV important ADJ gutem Essen zugetan nicely good ADJ ist wichtig bei fine zum Essen niederlassen ist gut bei fuers Essen drauf ist fein bei schlechtes Essen und 1000 Essen pro ist lebhafter bei zu realisieren , zu erreichen , NOUN food VERB zu stecken , zu essen , eating eat eat VERB VERB
How can label propagation help? Our Model:Graph-Based Projections • For a target language: • Plug in auto-tagged words from a source language • Links between source and target language units are word alignments • Run first stage of label propagation • Source language target language
First Stage of Label Propagation ADJ ADV important ADJ gutem Essen zugetan nicely good ADJ ist wichtig bei fine zum Essen niederlassen ist gut bei fuers Essen drauf ist fein bei schlechtes Essen und 1000 Essen pro ist lebhafter bei zu realisieren , zu erreichen , NOUN food VERB zu stecken , zu essen , eating eat eat VERB VERB
First Stage of Label Propagation ADJ ADV important ADJ gutem Essen zugetan nicely good ADJ ist wichtig bei fine zum Essen niederlassen ist gut bei fuers Essen drauf ist fein bei schlechtes Essen und 1000 Essen pro ist lebhafter bei zu realisieren , zu erreichen , NOUN food VERB zu stecken , zu essen , eating eat eat VERB VERB
How can label propagation help? Our Model:Graph-Based Projections • For a target language: • Plug in auto-tagged words from a source language • Links between source and target language units are word alignments • Run first stage of label propagation • Source language target language • Run second stage of label propagation • within target language vertices • graph objective function with squared penalties
Second Stage of Label Propagation ADJ ADV important ADJ gutem Essen zugetan nicely good ADJ ist wichtig bei fine zum Essen niederlassen ist gut bei fuers Essen drauf ist fein bei schlechtes Essen und 1000 Essen pro ist lebhafter bei zu realisieren , zu erreichen , NOUN food VERB zu stecken , zu essen , eating eat eat VERB VERB