1 / 27

Translation Models: Taking Translation Direction into Account

Translation Models: Taking Translation Direction into Account. Gennadi Lembersky Noam Ordan Shuly Wintner ISCOL, 2011. Statistical Machine Translation (SMT). Given foreign sentence f : “Maria no dio una bofetada a la bruja verde ” Find the most likely English translation e :

zorion
Download Presentation

Translation Models: Taking Translation Direction into Account

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Translation Models: Taking Translation Direction into Account GennadiLembersky Noam Ordan ShulyWintner ISCOL, 2011

  2. Statistical Machine Translation (SMT) • Given foreign sentence f: • “Maria no diounabofetada a la brujaverde” • Find the most likely English translation e: • “Maria did not slap the green witch” • Most likely English translation e is given by: arg max P(e|f): • P(e|f) estimates conditional probability of any e given f • How to estimate P(e|f)? • Noisy channel: • Decompose P(e|f) into P(f|e) * P(e) / P(f) • Estimate P(f|e)using parallel corpus (translation model) • Estimate P(e) using monolingual corpus (language model)

  3. Translation Model • How to model P(f|e)? • Learn parameters of P(f|e) from a parallel corpus • Estimate translation model parameters at the phrase level • explicit modeling of word context • captures local reorderings, local dependencies • IBM Models define how words in a source sentence can be aligned to words in a parallel target sentence • EM is used to estimate the parameters • Aligned words are extended to phrases • Results: phrase-table

  4. Log-Linear Models • Log-linear models • where hi are the feature functions and λi are the model parameters • typical feature functions: phrase translation probabilities, lexical translation probabilities, language model probability, reordering model • Model parameter estimation (tuning) using discriminative training; MERT algorithm (Och,2003)

  5. Evaluation • Human evaluation is not practical – too slow and costly • Automatic evaluation is based on a human reference translation • The output of an MT system is compared to the human translation of the same set of sentences • The metric basically calculate the distance between MT output and the reference translation • Tens of metrics were developed • BLEU is the most popular one • METEOR and TER are close

  6. Original vs. Translated Texts Given this simplified model: Two points are made with regard to the “intermediate component” (TM and LM): • TM is blind to direction (but see Kurokawa et al., 2009) • LMs are based on originally written texts. LM Source Text Target Text TM

  7. Original vs. Translated Texts Translated texts are ontologically different from non-translated texts ; they generally exhibit • Simplification of the message, the grammar or both (Al-Shabab, 1996, Laviosa, 1998) ; • Explicitation, the tendency to spell out implicit utterances that occur in the source text (Blum-Kulka, 1986).

  8. Original vs. Translated Texts • Translated texts can be distinguished from non-translated texts with high accuracy (87% and more) • For Italian (Baroni & Bernardini, 2006) • For Spanish (Iliseiet al., 2010); • For English (Koppel & Ordan, 2011)

  9. How Translation Direction Affects MT? • Language Models • Our work (accepted to EMNLP) shows that translated LMs are better for MT systems than the original ones. • Translation Models • Kurokawa et al, 2009 showed that when translating French into English it is better to use French-translated-to-English parallel corpus and vice versa. • This work supports this claim and extends it (in review for WMT)

  10. Our Setup • Canadian Hansard corpus: parallel French-English corpus • 80% Original English (EO) • 20% Original French (FO) • The ‘source’ language is marked • Two scenarios: • Balanced: 750K FO sentences and 750K EO sentences • Biased: 750K FO sentences and 3M EO sentences • MOSES PB-SMT toolkit • Tuning & Evaluation: • 1000 FO sentences for tuning and 5000 FO sentences for evaluation

  11. Baseline Experiments • We translate French-to-English • EO – train the phrase-table on EO portion of the parallel corpus • FO – train the phrase-table on FO portion of the parallel corpus • FO+EO – train the phrase-table on all the parallel corpus

  12. Baseline Results

  13. SystemA: Two Phrase-Tables • EO – train the phrase-table on EO portion of the parallel corpus • FO – train the phrase-table on FO portion of the parallel corpus • SystemA – let MOSES use both phrase-tables • Log-linear model training gives each phrase-table different scores

  14. SystemA Results • In the balanced scenario we gained 1.29 BLEU • In the biased scenario we gained 0.69 BLEU • The cost is the decoding time and memory resources

  15. Looking Inside… • Complete table – a phrase-table after training • Filtered table – a phrase-table that contains only phrases that appear in the evaluation set

  16. Few Observations… / 1 • Balanced Set / Complete tables • FO table has many more unique French phrases (15.8M vs. 13M) • EO table has more translation options per each source phrase (1.42 vs. 1.33) • The sources phrases in the intersection are shorter (3.76 vs. 5.07-5.16), but they have more translations (3.08-3.21 vs. 1.09-1.10)

  17. Few Observations… / 2 • Balanced Set / Filtered tables • The intersection comprises 96.1% of the translation phrase-pairs in the FO table and 98.3% of the translation phrase-pairs in the EO table.

  18. Few Observations… / 3 • Biased Set – we added 2,250,000 English-original sentences. What happens? • In ‘complete’ EO table – everything grows • In Filtered Tables • number of phrase-pairs increases by a factor of 3 • number of unique source phrases increases by 1/3 • Coverage of French phrases haven’t improved by much • The average number of translations increases by a factor of 2.3 (from 13.2 to 30.3) • Long tail – the probability is split between larger number of translations. Good translations get lower probability than in FO table

  19. How does MOSES Select Phrases? • Balanced Set • 96.5% comes from FO table • 99.3% of the phrase-pairs selected from the intersection originated in the FO table • Biased Set • 94.5% comes from FO table • 98.2% of the phrase-pairs selected from the intersection originated in the FO table

  20. The tuning effect /1 • A question: Is FO phrase-table better than the EO phrase-table or it becomes better during the tuning. • Let’s test SystemA with initial (pre-tuning) configuration and with the configuration generated by tuning.

  21. Balanced Set / Before tuning 58% only comes from the FO table 57.7% of the phrase-pairs selected from the intersection originated in the FO table The tuning effect /2 • Balanced Set / After tuning • 95.4% comes from FO table • 99.3% of the phrase-pairs selected from the intersection originated in the FO table

  22. The tuning effect /3 • The decoder prefers the FO table in the initial configuration (58%). • The preference becomes much stronger after tuning (95.4%) • Interestingly, the decoder doesn’t just replace EO phrases with FO phrases; it searches for the longer phrases; • The average length of a phrase selected from the EO table increases by about 1.5 words.

  23. New Experiment: SystemB • Based on these results, we can through away the intersection subset of the EO phrase-table • We expect a small loss in quality, but a significant improvement in translation speed.

  24. SystemB Results

  25. What about classified corpus? • Annotation of the source language is rarely available in the parallel corpora. • Will our SystemA and SystemB outperform FO+EO and FO MT systems? • We use we use SVM for classification, and our features are punctuation marks and the n-grams of part-of-speech tags. • We train the classifier on an English-French subset of the Europarlcorpus. • Accuracy is about 73.5%

  26. Classified System Results

  27. Thank You!

More Related