Statistical Machine Translation System

Statistical Machine Translation System Stephan Vogel Interactive Systems Lab Language Technologies Institute Carnegie Mellon University

Overview • Statistical Machine Translation – A 5min Intro • Word Alignment Models – Another 5min Intro • Phrase Translation – A somewhat longer Intro • Different approaches • PESA: Phrase Pair Extraction as Sentence Splitting Algorithm • Scoring of phrase translations • Decoding – What this talk is about • Decoding strategies • 2 stage decoding: translation lattice generation, best path search • Recombination and pruning • N-best list generation

Translation Example: Chinese • Src: 今年前两月广东高新技术产品出口３７．６亿美元 • Ref: Export of High-tech Products in Guangdong in First Two Months This Year Reached 3.76 billion US dollars • Hyp: In February this year , Guangdong's exports of high-tech products 3.76 billion dollars • Src: 新华社广州３月１６日电（记者陈冀）最新统计数字显示，今年１至２月，广东省高新技术产品出口３７．６亿美元，同比增长３４．８％，占全省出口总值的２５．５％。 • Ref: Xinhua News Agency, Guangzhou, March 16 (Reporter Chen Ji) The latest statistics show that from January through February this year, the export of high-tech products in Guangdong Province reached 3.76 billion US dollars, up 34.8% over the same period last year and accounted for 25.5% of the total export in the province. • Hyp: Guangzhou , March 16 ( Xinhua ) -- Chen , the latest statistics show that in February this year , Guangdong , exports of high-tech products 3.76 billion U.S. dollars , compared with 34.8% , the growth of the province's total export value of 25.5% . • Src: 高新技术产品出口亮点频现，为广东对外贸易的增长做出了重要贡献。 • Ref: Export of high-tech products has frequently been in the spotlight, making a significant contribution to the growth of foreign trade in Guangdong. • Hyp: Exports of high-tech products now frequent wei-liang points for the growth of Guangdong's foreign trade has made important contributions .

Translation Example: Chinese (cont.) • Src: 去年，广东省高新技术产品出口２２２．９４亿美元，同比增长３１％，增幅高于全省出口增速２７．２个百分点；高新技术产品出口净增加５２．７亿美元，弥补了传统劳动密集型产品因价格下降带来的出口值减少。 • Ref: Last year, the export of high-tech products in Guangdong Province was 22.294 billion US dollars, up 31% over the same period the year before, which in turn was 27.2% above the average export growth of the entire province. The net increase of export of high-tech products was 5.27 billion US dollars, making up for the reduced value of exports as a result of the price drop of the traditional labor-intensive products. • Hyp: Guangdong last year , exports of high-tech products 22.294 billion U.S. dollars , compared with 31% growth rate of the province's total export growth rate of 27.2 percentage of exports of high-tech products in a net increase of 5.27 billion dollars , up traditional labor-intensive products due to price demand-reducing value of domestic exports decreased .

Arabic Translation Example • لقاء ثالث خلال يومين بين وزيري الخارجية المصري والسوداني • القاهرة 62-01 (اف ب)- علم لدى وزارة الخارجية المصرية ان وزير الخارجية عمرو موسى اجرى محادثات اليوم الثلاثاء مع نظيره السوداني مصطفى عثمان اسماعيل للمرة الثالثة خلال يومين. • وقال اسماعيل للصحافيين انه يحمل رسالة من الرئيس السوداني الفريق عمر البشير الى نظيره المصري حسني مبارك. • واوضح ان سيسلم مبارك الرسالة خلال لقائهما الخميس. • واضاف الوزير السوداني ان الرسالة تتعلق بدور مصر والجهود اللازمة للتوصل الى مصالحة في السودان. • وكان اسماعيل اعلن الاحد ان حكومته تؤيد الجهود المبذولة لتحقيق تكامل بين المبادرتين العربية والافريقية من اجل وضع حد للحرب الاهلية الدائرة في السودان منذ اكثر من 61 عاما. • وقد اعلن التجمع الوطني الديموقراطي الذي يضم احزاب المعارضة الشمالية السودانية والمتمردين الجنوبيين بعد اجتماعه الخميس الماضي في القاهرة انه قرر العمل للتوفيق بين المبادرة المصرية الليبية ومبادرة الهيئة الحكومية للتنمية (ايغاد) لانهاء الحرب الاهلية. • واستقبل رئيس الوزراء المصري عاطف عبيد اليوم الثلاثاء ايضا الوزير السوداني وبحثا في العلاقات الثنائية والتجارية المصرية-السودانية.

SMT Output • The third meeting two days between foreign ministers of the Egyptian and Sudanese • Cairo 10-26 ( AFP ) - informed the Ministry of Foreign Affairs of the Egyptian Foreign Minister Amr Moussa held talks Tuesday with his Sudanese counterpart Mustafa Osman Ismail , the first 3 days . • Ismail told reporters that carries a message from President Omar Bashir to his Egyptian counterpart Hosni Mubarak . • He delivered a message from Mubarak during their meeting on Thursday . • The minister added that the Sudanese letter on Egypt's role in efforts to reach national reconciliation in Sudan . • Once had Ismail announced Sunday that his country supports efforts to achieve integration between African and Arab initiatives to end the civil war in Sudan more than 16 years old . • It was announced by the National Democratic grouping which comprises the opposition northern Sudanese southerners insurgents after their meeting last Thursday in Cairo had decided to reconcile the Egyptian-Libyan initiative for Development ( IGAD ) to end the civil war . • Gabriel Egyptian Premier Dr. Atef Ebeid announced today Tuesday that the Sudanese Minister search in bilateral relations , the Egyptian commercial - SAF .

Reference Translation and SMT Output • Ref: Third meeting in two days between Egyptian and Sudanese Ministers of Foreign Affairs • Hyp: The third meeting two days between foreign ministers of the Egyptian and Sudanese Sudanese • Ref: Cairo 10-26 (AFP)- According to the Egyptian Ministry of Foreign Affairs, the Minister of Foreign Affairs, Amru Mussa, has held talks today, Tuesday, with his Sudanese counterpart, Mustapha Uthman Ismail, for the third time in two days. • Hyp: Cairo 10-26 ( AFP ) - informed the Ministry of Foreign Affairs of the Egyptian Foreign Minister Amr Moussa held talks Tuesday with his Sudanese counterpart Mustafa Osman Ismail , the first 3 days . • … • Ref: Ismail had declared on Sunday that his government supports the efforts made to achieve agreement between the Arab initiative and the African one, in order to put an end to the civil war taking place in Sudan for more than 16 years. • Hyp: Once had Ismail announced Sunday that his country supports efforts to achieve integration between African and Arab initiatives to end the civil war in Sudan more than 16 years old . • Ref: After its meeting last Thursday in Cairo, the Democratic National Gathering, which includes the Sudanese northern opposition and the southern rebels, declared that it has decided to work in order to reconcile the Egypto -Libyan initiative and that of the Government Authority for Development (GAD), to put an end to the civil war. • Hyp: It was announced by the National Democratic grouping which comprises the opposition northern Sudanese southerners insurgents after their meeting last Thursday in Cairo had decided to reconcile the Egyptian-Libyan initiative for Development ( IGAD ) to end the civil war . • Ref: The Egyptian Prime Minister, Atef Abid, also received today, Tuesday, the Sudanese Minister and discussed with him Egypto-Sudanese bilateral and commercial relations. • Hyp: Gabriel Egyptian Premier Dr. Atef Ebeid announced today Tuesday that the Sudanese Minister search in bilateral relations , the Egyptian commercial - SAF .

Machine Translation Approaches • Grammar-based • Interlingua-based • Transfer-based • Direct • Example-based • Statistical

Statistical versus Grammar-Based • Often statistical and grammar-based MT are seen as alternatives, even opposing approaches – wrong !!! • Dichotomies are: • Use probabilities – everything is equally likely (in between: heuristics) • Rich (deep) structure – no or only flat structure • Both dimensions are continuous • Examples • EBMT: flat structure and heuristics • SMT: flat structure and probabilities • XFER: deep(er) structure and heuristics • Goal: structurally rich probabilistic models

Statistical Approach • Using statistical models • Create many alternatives, called hypotheses • Give a score to each hypothesis • Select the best -> search • Advantages • Avoid hard decisions • Sometimes, optimality can be guaranteed • Speed can be traded with quality, no all-or-nothing • It works better ! • Disadvantages • Difficulties in handling structurally rich models, mathematically and computationally (but that’s also true for non-statistical systems) • Need data to train the model parameters

Picture No 1: SMT Architecture Based on Bayes´ Decision Rule: ê = argmax{ p(e | f) } = argmax{ p(e) p(f | e) }

Tasks in SMT • Modelling build statistical models which capture characteristic features of translation equivalences and of the target language • Training train translation model on bilingual corpus, train language model on monolingual corpus • Decoding find best translation for new sentences according to models • Evaluation • Subjective evaluation: fluency, adequacy • Automatic evaluation: WER, Bleu, etc • And all the nitty-gritty stuff • Data cleaning • Parameter tuning

Word Alignment Models • We want to learn how to translate words and phrases • Can learn it from parallel corpora • Typically work with sentence aligned corpora • Available from LDC, ELRA, etc • For specific applications new data collection required • Model the associations between the different languages • Word to word mapping -> lexicon • Differences in word order -> distortion model • ‘Wordiness’, i.e. how many words to express a concept -> fertility • Statistical translation is based on word alignment models

Alignment Example Observations: • Often 1-1 • Often monotone • Some 1-to-many • Some 1-to-nothing • Not always clear-cut

Word Alignment Models • IBM1 – lexical probabilities only • IBM2 – lexicon plus absolute position • IBM3 – plus fertilities • IBM4 – „inverted“ position alignment • IBM5 – non-deficient version of model 4 • HMM – lexicon plus relative position • BiBr – Bilingual Bracketing, lexical probabilites plus reordering via parallel segmentation [Brown et.al. 1993, Vogel et.al. 1996, Och et al 1999, Wu 1997]

Notation • Target language • e: target (English) word • I: length of target sentence • i: position in target sentence (source position) • : target sentence • Source language • f: source (French) word • J: length of source sentence • j: position in source sentence (target position) • : source sentence • Alignment: relation a mapping source to target positions • i=aj: position i of ei which is aligned to j • : whole alignment

SMT - Principle • Translate a ‘French’ stringinto an ‘English’ string • Bayes’ decision rule for translation: • Why this inversion of the translation direction? • Decomposition of dependencies: makes modeling easier • Cooperation of two knowledge sources for final decision • Note1: Noisy channel model • Note2: Alternative is direct translation with log-linear model combination.

Alignment as Hidden Variable • ‘Hidden alignments’ to capture word-to-word correspondences • Mapping A subset of [1, …, J]x[1, …, I] • Number of connections: J * I (each source word with each target word • Number of alignments: 2JI (each connection yes/no) • Summation over all alignments

Restricted Alignment • Each source word has one connection • Alignment mapping becomes function: j -> i = aj • Number of alignments is now: (I+1)J • Sum over all probabilities: • Not possible to enumerate • In some situations full summation possible through Dynamic Programming • In other situations: take only best alignment and perhaps some alignments close to the best one

Translation Model • Sum over all alignment • 3 probability distributions: • Length: • Alignment: • Lexicon:

Model Assumptions Decompose interaction into pairwise dependencies • Length: Source length only dependent on target length (very weak) • Alignment: • Zero order model: target position only dependent on source position • First order model: target position only dependent on previous target position • Lexicon: source word only dependent on aligned word

Mixture Model • Interpretation as mixture model by direct decomposition

IBM1 Model • Assume uniform probability for position alignment • Alignment probability • In training: collect counts for word pairs

Alignment Example • One Chinese word aligned to multi-word English phrase • In lexicon individual entries with ‘the’, ‘development’, ‘of’ • Difficult to generate from words • Main translation ‘development’ • Test if insertions of ‘the’ and ‘of’ improves LM probability • Easier to generate if we have phrase pairs available

Why Phrase to Phrase Translation • Captures n x m alignments • Encapsulates context • Local reordering • Compensates segmentation errors

How to get Phrase Translation • Typically: Train word alignment model and extract phrase-to-phrase translations from Viterbi path • IBM model 4 alignment • HMM alignment • Bilingual Bracketing • Genuine phrase translation models • Integrated segmentation and alignment (ISA) • Phase Pair Extraction via full Sentence Alignment • Notes: • Often better results when training target to source for extraction of phrase translations due to asymmetry of alignment models • Phrases are not fully integrated into the alignment model, they are extracted only after training is completed – how to assign probabilities?

Phrase Pairs from Viterbi Path • Train your favorite word alignment (IBMn, HMM, …) • Calculate Viterbi path (i.e. path with highest probability or best score) • The details ….

Word Alignment Matrix • Alignment probabilities according to lexicon eI e1 f1 fJ

Viterbi Path • Calculate Viterbi path (i.e. path with highest probability or best score) eI e1 f1 fJ

Phrases from Viterbi Path • Read off source phrase – target phrase pairs eI e1 f1 fJ

Extraction of Phrases foreach source phrase length l { foreach start position j1 = 1 … J – l { foreach end position j2 = j1 + l – 1 { min_i = min{ a(j) : j = j1 … j2 } max_i = max{ a(j) : j = j1 … j2 } SourcePhrase = fj1 … fj2 TargetPhrase = emin_i … emax_i store SourcePhrase ‘#’ TargetPhrase } } } • Training in both directions and combine phrase pairs • Calculate probabilities • Pruning: take only n-best translations for each source phrase

Dealing with Asymmetry • Word alignment models are asymmetric; Viterbi path has: • multiple source words – one target word alignments • but no one source word – multiple target words alignments • Train alignment model also in reverse direction, i.e. target -> source • Using both Viterbi paths: • Simple: extract phrases from both directions and merge tables • ‘Merge’ Viterbi paths and extract phrase pairs according to resulting pattern

Combine Viterbi Path eI F->E E->F Intersect. e1 f1 fJ

Combine Viterbi Paths • Intersections: high precision, but low recall • Union: lower precision, but higher recall • Refined: start from intersection and fill gaps according to points in union • Different heuristics have been used • Och • Koehn • Quality of phrase translation pairs depends on: • Quality of word alignment • Quality of combination of Viterbi paths

Number of (Source) Phrases • Small corpus: 40k sentences with 400k words • Number of phrases quickly exceeds number of words in corpus • Numbers are for source phrases only; each phrase typically has multiple translations (factor 5 – 20)

Dealing with Memory Limitation • Phrase translation tables are memory killers • Number of phrases quickly exceeds number of words in corpus • Memory required is multiple of memory for corpus • We have corpora of 200 million words -> >1 billion source phrases> 10 billion phrase pairs • Restrict phrases • Only take short ones • Only take frequent ones • Evaluation modus • Load only phrases required for test sentences (i.e. extract from large phrase translation table) • Extract and store only required phrase pairs (i.e. part of training cycle at evaluation time)

New Phrase Alignment • Desiderata: • Use phrases up to any lengthCan not store all phrase pairs -> search them on the fly • High quality translation pairs • Balance with word-based translation

Phrase Alignment – New Approach • Search translation for one source phrase eI e1 f1 fj1 fj2 fJ

Phrase Alignment • What we would like to find eI ei2 ei1 e1 f1 fj1 fj2 fJ

Phrase-Pair Extraction via Sentence Alignment • Calculate modified IBM1 word alignment: don’t sum over words in ‘forbidden’ (grey) areas • Select target phrase boundaries which maximize sentence alignment probability • Modify boundaries i1 and i2 • Calculate sentence alignment • Take best i2 i1 j1 j2

Phrase Alignment • Search for optimal boundaries eI e1 f1 fj1 fj2 fJ

Statistical Machine Translation System

Statistical Machine Translation System

Presentation Transcript

Statistical Machine Translation

Statistical Machine Translation

Statistical XFER: Hybrid Statistical Rule-based Machine Translation

Statistical Machine Translation

Statistical Machine Translation

Statistical Machine Translation

Introduction to Statistical Machine Translation

Meta-level Statistical Machine Translation System

Statistical Machine Translation with Moses

Introduction to Statistical Machine Translation

Statistical Machine Translation Word Alignment

Statistical Machine Translation

Statistical Machine Translation

Statistical Alignment and Machine Translation

Introduction to Statistical Machine Translation

Statistical Machine Translation

Statistical Machine Translation

Statistical Machine Translation

Statistical Machine Translation

Statistical Machine Translation

Machine Translation, Statistical Approach

Statistical Machine Translation