540 likes | 661 Views
Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance. Shay B. Cohen Dipanjan Das Noah A. Smith Carnegie Mellon University. July 27 EMNLP 2011. Goal: . Learn linguistic structure for a language without any labeled data in that language.
E N D
Unsupervised Structure Predictionwith Non-Parallel Multilingual Guidance Shay B. Cohen Dipanjan Das Noah A. Smith Carnegie Mellon University July 27 EMNLP 2011
Goal: Learn linguistic structure for a language without any labeled data in that language The Skibo Castle is close by . VERB . ADJ DET NOUN NOUN ADP Dependency Parsing Part-of-Speech Tagging EMNLP 2011
Multilingual Unsupervised Learning (hard) no parallel data using parallel data supervision in source language(s) supervision in source language(s) joint learning for multiple languages joint learning for multiple languages Yarowsky and Ngai (2001) Cohen and Smith (2009) Snyder et al. (2009) This work! Xi and Hwa (2005) Berg-Kirkpatrick and Klein (2010) Naseem et al. (2010) Smith and Eisner (2009) Das and Petrov (2011) McDonald et al. (2011) EMNLP 2011
In a Nutshell Portuguese parameters Annotated data Unlabeled data in Portuguese = + Spanish Italian Monolingual unsupervised training in Portuguese Coarse, universal parameters Interpolation (unsupervised training) Coarse-to-fine expansion and initialization Coarse, universal parameters coarse parameters of Portuguese EMNLP 2011
Assumptions for a given problem: 1. Underlying model is generative close by The Skibo is Castle HMM Merialdo (1994) EMNLP 2011
Assumptions for a given problem: 1. Underlying model is generative ROOT DMV Klein and Manning (2004) ADP ADJ NOUN VERB NOUN DET EMNLP 2011 6
Assumptions for a given problem: 1. Underlying model is generative Composed of multinomial distributions close by The Skibo is Castle HMM Merialdo (1994) EMNLP 2011 7
Assumptions for a given problem: 1. Underlying model is generative Composed of multinomial distributions ROOT DMV Klein and Manning (2004) ADP ADJ NOUN VERB NOUN DET EMNLP 2011 8
Assumptions for a given problem: 1. Underlying model is generative In general, unlexicalized parameters look like: kth multinomial in the model ith event in the multinomial e.g. transition from ADJ ( )to NOUN ( ) EMNLP 2011 9
Assumptions for a given problem: 1. Underlying model is generative The lexicalized parameters take a similar form (No lexicalized parameters for the DMV) EMNLP 2011 10
Assumptions for a given problem: 1. Underlying model is generative number of times event i of multinomial k fires in the derivation unlexicalized lexicalized EMNLP 2011 11
Assumptions for a given problem: 2. Coarse, universal part-of-speech tags EMNLP 2011
Assumptions for a given problem: 2. Coarse, universal part-of-speech tags For each language , there is a mapping Treebank tagset EMNLP 2011
Assumptions for a given problem: 3. helper languages For each: coarse conversion Coarse treebank Treebank MLE unlexicalized parameters EMNLP 2011
Multilingual Modeling EMNLP 2011
Multilingual Modeling For a target language, unlexicalized parameters: mixture weight for kthmultinomial for the th helper language kth multinomial in the model (say, the transitions from the ADJ tag in an HMM) EMNLP 2011
Multilingual Modeling e.g., two helper languages: Spanish and Italian ADJ → . ADJ → . ADJ → . ADJ → . ADJ → . 0.7 0.3 EMNLP 2011
Multilingual Modeling e.g., two helper languages: Spanish and Italian ADJ → . ADJ → . ADJ → . ADJ → . ADJ → . ? ? unknown EMNLP 2011
Learning and Inference EMNLP 2011
Learning and Inference normal learning EMNLP 2011
Learning and Inference multilingual learning are fixed! EMNLP 2011
Learning and Inference Multilingual learning learning with EM: M-step: Number of times is used in a derivation EMNLP 2011
Learning and Inference Multilingual learning What about feature-rich generative models? Locally normalized log-linear model Berg-Kirkpatrick et al. (2010) EMNLP 2011
Multilingual Modeling e.g., two helper languages: Spanish and Italian ADJ → . ADJ → . ADJ → . ADJ → . ? ? unknown EMNLP 2011
Multilingual Modeling e.g., two helper languages: Spanish and Italian ADJ → . ADJ → . ADJ → . ADJ → . ADJ → . 0.6237 0.3763 learned EMNLP 2011
Learning and Inference Coarse-to-fine expansion (for English) ADJ → . JJR → . JJS → . JJ → . identical copies Step 1 EMNLP 2011
Learning and Inference Coarse-to-fine expansion (for English) JJ → . EMNLP 2011
Learning and Inference Coarse-to-fine expansion (for English) Step 2 Monolingual unsupervised training JJ → . JJ → . Equal division Initializer . . . . . . . . . . . . new, fine EMNLP 2011
Experiments EMNLP 2011
Two Problems • Unsupervised • Part-of-Speech • Tagging • Model: • feature-based HMM • (Berg-Kirkpatrick et al., 2010) • Learning: • L-BFGS • Unsupervised • Dependency • Parsing • Model: • DMV • (Klein and Manning, 2004) • Learning: • EM EMNLP 2011
Languages Target Languages: Bulgarian, Danish, Dutch, Greek, Japanese, Portuguese, Slovene, Spanish, Swedish, and Turkish Helper Languages: English, German, Italian and Czech (CoNLLTreebanks from 2006 and 2007) EMNLP 2011
Results: POS Tagging Full model Uniform mixture parameters (no learning) Monolingual baseline (Berg-Kirkpatrick et al., 2010) (without tag dictionary) EMNLP 2011
Results: POS Tagging (without tag dictionary) EMNLP 2011
Results: Dependency Parsing Phylogenetic Grammar Induction (Berg-Kirkpatrick and Klein, 2010) Posterior Regularization (Gillenwater et al, 2010) Monolingual EM (Klein and Manning, 2004) EMNLP 2011
Results: Dependency Parsing Uniform mixture parameters Coarse-to-fine expansion → monolingual learning Learned mixture parameters Coarse-to-fine expansion → monolingual learning Learned mixture parameters No coarse-to-fine expansion 1. Uniform mixture parameters 2. No coarse-to-fine expansion (no learning) EMNLP 2011
Results: Dependency Parsing EMNLP 2011
Results: Dependency Parsing EMNLP 2011
Analyzing with Principal Component Analysis Two principal components EMNLP 2011
From Words to Dependencies EMNLP 2011
From Words to Dependencies Use induced tags to induce dependencies In a pipeline Using the posteriors over tagsin a sausage lattice(Cohen and Smith, 2007) EMNLP 2011
From Words to Dependencies Joint Decoding: DET : 0.95 DET : 0.0 DET : 0.01 ADJ: 0.03 ADJ: 0.3 ADJ: 0.1 Parsing a lattice 1 2 3 4 NOUN: 0.02 NOUN: 0.7 NOUN: 0.89 Skibo Castle The DMV EMNLP 2011
Results: Words to Dependencies EMNLP 2011
Results: Words to Dependencies EMNLP 2011
Results: Words to Dependencies Best average result with gold tags: 62.2 Interesting result: Auto tags perform better for Turkish and Slovene EMNLP 2011
Conclusions EMNLP 2011
Conclusions • Improvements for two major tasks using non-parallel multilingual guidance • In general grammar induction results better than POS tagging • Joint POS and dependency parsing performs surprisingly well • For a few languages, results are better than using gold tags • Joint decoding performs better than a pipeline EMNLP 2011
Questions? EMNLP 2011
Results: POS Tagging (without tag dictionary) EMNLP 2011
Results: POS Tagging (without tag dictionary) EMNLP 2011
Results: POS Tagging (with tag dictionary) EMNLP 2011