670 likes | 793 Views
Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections. Dipanjan Das Carnegie Mellon University. Slav Petrov Google Research. June 21 ACL 2011. Part-of-Speech Tagging. Portland has a thriving music scene . . ADJ. NOUN. NOUN. NOUN. DET. VERB.
E N D
Unsupervised Part-of-Speech Taggingwith Bilingual Graph-Based Projections Dipanjan Das Carnegie Mellon University Slav Petrov Google Research June 21 ACL 2011
Part-of-Speech Tagging Portland has a thriving music scene . . ADJ NOUN NOUN NOUN DET VERB
(Nearly) Universal Part-of-Speech Tags See Petrov, Das and McDonald (2011)
(Nearly) Universal Part-of-Speech Tags Example Penn Treebank tag maps: NN NOUN NNP NOUN NNPS NOUN NNS NOUN PRPPRON PRP$ PRON WP PRON WP$ PRON Example Spanish Treebank tag maps: p0 PRON pdPRON pePRON pi PRON pnPRON ppPRON prPRON ptPRON pxPRON npNOUN ncNOUN
(Nearly) Universal Part-of-Speech Tags Portland has a thriving music scene . . ADJ NOUN NOUN NOUN DET VERB Portland hat eine prächtig gedeihende Musikszene . . NOUN NOUN DET VERB ADJ ADJ পোর্টল্যান্ড শহর এরসঙ্গীত পরিবেশ বেশ উন্নত | ADJ NOUN ADP NOUN ADJ NOUN NOUN . Supervised training data available for ~20 languages.
Supervised Universal POS Tagging TnT(Brants, 2000) Generalizes well for the supervised setting: average accuracy is 96.2%
Resource-Poor Languages Several major languages with no or little annotated data e.g. Native speakers Punjabi 109 million Vietnamese 69 million However, lots of parallel and unannotated data! Basic NLP tools like POS tagging essential for development of language technologies Polish 40 million 32 million Oriya 37 million Indonesian-Malay Azerbaijani 20 million Haitian 7.7 million See http://www.ethnologue.org/ethno_docs/distribution.asp?by=size
Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm ? ? ? ? ? ? ? eine prächtig . hat Musikszene Portland gedeihende : observation sequence : state sequence
Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm one of the 12 coarse tags ? ? ? ? ? ? ? eine prächtig . hat Musikszene Portland gedeihende : observation sequence : state sequence
Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm transition multinomials ? ? hat Portland : observation sequence : state sequence
Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm emission multinomials ? ? hat Portland : observation sequence : state sequence
Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm ? ? ? ? ? ? ? eine prächtig . hat Musikszene Portland gedeihende EM-HMM Poor average result
Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) with locally normalized log-linear models emission multinomials ? ? hat Portland : observation sequence : state sequence Berg-Kirkpatrick et al. (2010)
Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) with locally normalized log-linear models emission multinomials ? ? hat Portland suffixhyphencapital letters numbers... : observation sequence : state sequence Berg-Kirkpatrick et al. (2010)
Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) with locally normalized log-linear models Estimated using gradient-based methods emission multinomials ? ? hat Portland suffixhyphencapital letters numbers... : observation sequence : state sequence Berg-Kirkpatrick et al. (2010)
Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) with locally normalized log-linear models Estimated using gradient-based methods emission multinomials ? ? hat Portland EM-HMM Feature-HMM Improvements across all languages Berg-Kirkpatrick et al. (2010)
Unsupervised POS Tagging with Dictionaries Hidden Markov Model (HMM) with locally normalized log-linear models State space constrained by possible gold tags PRON DET ADJ NUM ADJ ADV ADJ NOUN NOUN . VERB eine prächtig . hat Musikszene Portland gedeihende
Unsupervised POS Tagging with Dictionaries Hidden Markov Model (HMM) with locally normalized log-linear models State space constrained by possible gold tags PRON DET ADJ NUM ADJ ADV ADJ NOUN NOUN . VERB eine prächtig . hat Musikszene Portland gedeihende EM-HMM Feature-HMM w/ gold dictionary Average result close to supervised accuracy!
Morphologically rich languages only have base forms in dictionaries For most languages, access to high-quality tag dictionaries is not realistic. • Ideas: • Use supervision in resource-rich languages • Use translated data • Construct projected tag lexicons
Bilingual Projection automatic labels from supervised tagger, 97% accuracy . ADJ NOUN NOUN NOUN DET VERB Portland has a thriving music scene .
Bilingual Projection . ADJ NOUN NOUN NOUN DET VERB Portland has a thriving music scene . Portland hat eine prächtig gedeihende Musikszene . Automatic unsupervised alignments from translation data (available for more than 50 languages)
Bilingual Projection . ADJ NOUN NOUN NOUN VERB DET Portland has a thriving music scene . Portland hat eine prächtig gedeihende Musikszene . NOUN (most frequent tag) unaligned word Idea 1:direct projection Yarowsky and Ngai (2001)
Bilingual Projection + more projected tagged sentences Portland hat eine prächtig gedeihende Musikszene . . NOUN NOUN DET VERB NOUN ADJ supervised training tagger (Brants, 2000) Idea 1:direct projection Yarowsky and Ngai (2001)
Bilingual Projection Idea 1:direct projection EM-HMM Feature-HMM Direct projection Yarowsky and Ngai (2001)
Bilingual Projection Idea 1:direct projection EM-HMM Feature-HMM Direct projection consistent improvements over unsupervised models Yarowsky and Ngai (2001)
Bilingual Projection Idea 2:lexicon projection
Bilingual Projection Idea 2:lexicon projection . ADJ NOUN NOUN NOUN DET VERB Portland has a thriving . music scene prächtig gedeihende . Portland hat eine Musikszene
Bilingual Projection Idea 2:lexicon projection . ADJ NOUN NOUN NOUN DET VERB . Portland gedeihende Portland has a thriving . music scene prächtig hat ignore unaligned word eine Musikszene
Bilingual Projection Idea 2:lexicon projection . ADJ NOUN NOUN NOUN DET VERB . Portland gedeihende Portland has a thriving . music scene Bag of alignments hat eine Musikszene
Bilingual Projection Idea 2:lexicon projection . ADJ NOUN NOUN NOUN DET VERB . Portland gedeihende Portland has a thriving . music scene hat eine Musikszene
Bilingual Projection Idea 2:lexicon projection . ADJ NOUN NOUN PRON NUM NOUN DET VERB . one one Portland gedeihende Portland has a thriving . music scene hat eine Musikszene
Bilingual Projection Idea 2:lexicon projection . VERB ADJ NOUN NOUN PRON NUM NOUN DET VERB . one one Portland gedeihende Portland has a thriving thriving . music scene hat eine Musikszene
Bilingual Projection Idea 2:lexicon projection After scanning all the parallel data: . eine gedeihende Portland hat Musikszene = probability of a tag given a word
Bilingual Projection Idea 2:lexicon projection Feature HMM constrained with projected dictionary EM-HMM Feature-HMM Direct projection Projected Dictionary Improvements over simple projection for majority of the languages
No information about unaligned words Can coverage be improved? Idea: Projected lexicon expansion and refinement using a lot of unlabeled data
Brief Overview: Graph-Based Learning with Labeled and Unlabeled Data
0.9 0.1 0.01 labeled datapoints unlabeled datapoints supervised label distributions distributions to be found 0.9 0.8 = symmetric weight matrix Zhu, Ghahramaniand Lafferty, 2003
Label Propagation 0.9 0.1 0.01 0.9 0.8 Zhu, Ghahramaniand Lafferty, 2003
Label Propagation 0.9 0.1 0.01 0.9 0.8 set of distributions over unlabeled vertices Zhu, Ghahramaniand Lafferty, 2003
Label Propagation 0.9 0.1 0.01 0.9 0.8 unlabeled vertices Zhu, Ghahramaniand Lafferty, 2003
Label Propagation 0.9 0.1 0.01 0.9 0.8 brings the distributions of similar vertices closer Zhu, Ghahramaniand Lafferty, 2003
Label Propagation 0.9 0.1 0.01 0.9 Size of the label set 0.8 brings the distributions of uncertain neighborhoods close to the uniform distribution Zhu, Ghahramaniand Lafferty, 2003
Label Propagation 0.9 0.1 0.01 0.9 0.8 Iterative updates for optimization Zhu, Ghahramaniand Lafferty, 2003
Idea 3:Graph-Based Projections How can label propagation help? • For a language: • Build graph over a lot of trigram types as vertices • compute similarity matrix using co-occurrence statistics • Label distribution at each vertextag distribution over the trigram’s middle word Subramanya, Petrov and Pereira (2010)
Example Graph in German gutem Essen zugetan istwichtigbei zum Essen niederlassen ist gut bei fuers Essen drauf istfeinbei schlechtes Essen und 1000 Essen pro istlebhafterbei zu realisieren , zu essen , zu stecken , zu erreichen ,
Example Graph in German gutem Essen zugetan NOUN ist wichtig bei zum Essen niederlassen ist gut bei fuers Essen drauf ist fein bei schlechtes Essen und 1000 Essen pro ist lebhafter bei zu realisieren , zu essen , zu stecken , zu erreichen , VERB
Idea 3:Graph-Based Projections How can label propagation help? • For a target language: • Build graph over a 2M trigram types as vertices • compute similarity matrix using co-occurrence statistics • Label distribution at each vertextag distribution over the trigram’s middle word • Plug in auto-tagged words from a source language • Links between source and target language units are word alignments
Bilingual Graph ADJ ADV ADJ important gutem Essen zugetan nicely good ADJ ist wichtig bei fine zum Essen niederlassen ist gut bei fuers Essen drauf ist fein bei schlechtes Essen und 1000 Essen pro ist lebhafter bei zu realisieren , zu essen , NOUN food VERB zu stecken , zu erreichen , eating eat eat VERB VERB