1.11k likes | 1.58k Views
Statistical Machine Translation System. Stephan Vogel Interactive Systems Lab Language Technologies Institute Carnegie Mellon University. Overview. Statistical Machine Translation – A 5min Intro Word Alignment Models – Another 5min Intro Phrase Translation – A somewhat longer Intro
E N D
Statistical Machine Translation System Stephan Vogel Interactive Systems Lab Language Technologies Institute Carnegie Mellon University
Overview • Statistical Machine Translation – A 5min Intro • Word Alignment Models – Another 5min Intro • Phrase Translation – A somewhat longer Intro • Different approaches • PESA: Phrase Pair Extraction as Sentence Splitting Algorithm • Scoring of phrase translations • Decoding – What this talk is about • Decoding strategies • 2 stage decoding: translation lattice generation, best path search • Recombination and pruning • N-best list generation
Translation Example: Chinese • Src: 今年前两月广东高新技术产品出口37.6亿美元 • Ref: Export of High-tech Products in Guangdong in First Two Months This Year Reached 3.76 billion US dollars • Hyp: In February this year , Guangdong's exports of high-tech products 3.76 billion dollars • Src: 新华社广州3月16日电(记者 陈冀)最新统计数字显示,今年1至2月,广东省高新技术产品出口37.6亿美元,同比增长34.8%,占全省出口总值的25.5%。 • Ref: Xinhua News Agency, Guangzhou, March 16 (Reporter Chen Ji) The latest statistics show that from January through February this year, the export of high-tech products in Guangdong Province reached 3.76 billion US dollars, up 34.8% over the same period last year and accounted for 25.5% of the total export in the province. • Hyp: Guangzhou , March 16 ( Xinhua ) -- Chen , the latest statistics show that in February this year , Guangdong , exports of high-tech products 3.76 billion U.S. dollars , compared with 34.8% , the growth of the province's total export value of 25.5% . • Src: 高新技术产品出口亮点频现,为广东对外贸易的增长做出了重要贡献。 • Ref: Export of high-tech products has frequently been in the spotlight, making a significant contribution to the growth of foreign trade in Guangdong. • Hyp: Exports of high-tech products now frequent wei-liang points for the growth of Guangdong's foreign trade has made important contributions .
Translation Example: Chinese (cont.) • Src: 去年,广东省高新技术产品出口222.94亿美元,同比增长31%,增幅高于全省出口增速27.2个百分点;高新技术产品出口净增加52.7亿美元,弥补了传统劳动密集型产品因价格下降带来的出口值减少。 • Ref: Last year, the export of high-tech products in Guangdong Province was 22.294 billion US dollars, up 31% over the same period the year before, which in turn was 27.2% above the average export growth of the entire province. The net increase of export of high-tech products was 5.27 billion US dollars, making up for the reduced value of exports as a result of the price drop of the traditional labor-intensive products. • Hyp: Guangdong last year , exports of high-tech products 22.294 billion U.S. dollars , compared with 31% growth rate of the province's total export growth rate of 27.2 percentage of exports of high-tech products in a net increase of 5.27 billion dollars , up traditional labor-intensive products due to price demand-reducing value of domestic exports decreased .
Arabic Translation Example • لقاء ثالث خلال يومين بين وزيري الخارجية المصري والسوداني • القاهرة 62-01 (اف ب)- علم لدى وزارة الخارجية المصرية ان وزير الخارجية عمرو موسى اجرى محادثات اليوم الثلاثاء مع نظيره السوداني مصطفى عثمان اسماعيل للمرة الثالثة خلال يومين. • وقال اسماعيل للصحافيين انه يحمل رسالة من الرئيس السوداني الفريق عمر البشير الى نظيره المصري حسني مبارك. • واوضح ان سيسلم مبارك الرسالة خلال لقائهما الخميس. • واضاف الوزير السوداني ان الرسالة تتعلق بدور مصر والجهود اللازمة للتوصل الى مصالحة في السودان. • وكان اسماعيل اعلن الاحد ان حكومته تؤيد الجهود المبذولة لتحقيق تكامل بين المبادرتين العربية والافريقية من اجل وضع حد للحرب الاهلية الدائرة في السودان منذ اكثر من 61 عاما. • وقد اعلن التجمع الوطني الديموقراطي الذي يضم احزاب المعارضة الشمالية السودانية والمتمردين الجنوبيين بعد اجتماعه الخميس الماضي في القاهرة انه قرر العمل للتوفيق بين المبادرة المصرية الليبية ومبادرة الهيئة الحكومية للتنمية (ايغاد) لانهاء الحرب الاهلية. • واستقبل رئيس الوزراء المصري عاطف عبيد اليوم الثلاثاء ايضا الوزير السوداني وبحثا في العلاقات الثنائية والتجارية المصرية-السودانية.
SMT Output • The third meeting two days between foreign ministers of the Egyptian and Sudanese • Cairo 10-26 ( AFP ) - informed the Ministry of Foreign Affairs of the Egyptian Foreign Minister Amr Moussa held talks Tuesday with his Sudanese counterpart Mustafa Osman Ismail , the first 3 days . • Ismail told reporters that carries a message from President Omar Bashir to his Egyptian counterpart Hosni Mubarak . • He delivered a message from Mubarak during their meeting on Thursday . • The minister added that the Sudanese letter on Egypt's role in efforts to reach national reconciliation in Sudan . • Once had Ismail announced Sunday that his country supports efforts to achieve integration between African and Arab initiatives to end the civil war in Sudan more than 16 years old . • It was announced by the National Democratic grouping which comprises the opposition northern Sudanese southerners insurgents after their meeting last Thursday in Cairo had decided to reconcile the Egyptian-Libyan initiative for Development ( IGAD ) to end the civil war . • Gabriel Egyptian Premier Dr. Atef Ebeid announced today Tuesday that the Sudanese Minister search in bilateral relations , the Egyptian commercial - SAF .
Reference Translation and SMT Output • Ref: Third meeting in two days between Egyptian and Sudanese Ministers of Foreign Affairs • Hyp: The third meeting two days between foreign ministers of the Egyptian and Sudanese Sudanese • Ref: Cairo 10-26 (AFP)- According to the Egyptian Ministry of Foreign Affairs, the Minister of Foreign Affairs, Amru Mussa, has held talks today, Tuesday, with his Sudanese counterpart, Mustapha Uthman Ismail, for the third time in two days. • Hyp: Cairo 10-26 ( AFP ) - informed the Ministry of Foreign Affairs of the Egyptian Foreign Minister Amr Moussa held talks Tuesday with his Sudanese counterpart Mustafa Osman Ismail , the first 3 days . • … • Ref: Ismail had declared on Sunday that his government supports the efforts made to achieve agreement between the Arab initiative and the African one, in order to put an end to the civil war taking place in Sudan for more than 16 years. • Hyp: Once had Ismail announced Sunday that his country supports efforts to achieve integration between African and Arab initiatives to end the civil war in Sudan more than 16 years old . • Ref: After its meeting last Thursday in Cairo, the Democratic National Gathering, which includes the Sudanese northern opposition and the southern rebels, declared that it has decided to work in order to reconcile the Egypto -Libyan initiative and that of the Government Authority for Development (GAD), to put an end to the civil war. • Hyp: It was announced by the National Democratic grouping which comprises the opposition northern Sudanese southerners insurgents after their meeting last Thursday in Cairo had decided to reconcile the Egyptian-Libyan initiative for Development ( IGAD ) to end the civil war . • Ref: The Egyptian Prime Minister, Atef Abid, also received today, Tuesday, the Sudanese Minister and discussed with him Egypto-Sudanese bilateral and commercial relations. • Hyp: Gabriel Egyptian Premier Dr. Atef Ebeid announced today Tuesday that the Sudanese Minister search in bilateral relations , the Egyptian commercial - SAF .
Overview • Statistical Machine Translation – A 5min Intro • Word Alignment Models – Another 5min Intro • Phrase Translation – A somewhat longer Intro • Different approaches • PESA: Phrase Pair Extraction as Sentence Splitting Algorithm • Scoring of phrase translations • Decoding – What this talk is about • Decoding strategies • 2 stage decoding: translation lattice generation, best path search • Recombination and pruning • N-best list generation
Machine Translation Approaches • Grammar-based • Interlingua-based • Transfer-based • Direct • Example-based • Statistical
Statistical versus Grammar-Based • Often statistical and grammar-based MT are seen as alternatives, even opposing approaches – wrong !!! • Dichotomies are: • Use probabilities – everything is equally likely (in between: heuristics) • Rich (deep) structure – no or only flat structure • Both dimensions are continuous • Examples • EBMT: flat structure and heuristics • SMT: flat structure and probabilities • XFER: deep(er) structure and heuristics • Goal: structurally rich probabilistic models
Statistical Approach • Using statistical models • Create many alternatives, called hypotheses • Give a score to each hypothesis • Select the best -> search • Advantages • Avoid hard decisions • Sometimes, optimality can be guaranteed • Speed can be traded with quality, no all-or-nothing • It works better ! • Disadvantages • Difficulties in handling structurally rich models, mathematically and computationally (but that’s also true for non-statistical systems) • Need data to train the model parameters
Picture No 1: SMT Architecture Based on Bayes´ Decision Rule: ê = argmax{ p(e | f) } = argmax{ p(e) p(f | e) }
Tasks in SMT • Modelling build statistical models which capture characteristic features of translation equivalences and of the target language • Training train translation model on bilingual corpus, train language model on monolingual corpus • Decoding find best translation for new sentences according to models • Evaluation • Subjective evaluation: fluency, adequacy • Automatic evaluation: WER, Bleu, etc • And all the nitty-gritty stuff • Data cleaning • Parameter tuning
Overview • Statistical Machine Translation – A 5min Intro • Word Alignment Models – Another 5min Intro • Phrase Translation – A somewhat longer Intro • Different approaches • PESA: Phrase Pair Extraction as Sentence Splitting Algorithm • Scoring of phrase translations • Decoding – What this talk is about • Decoding strategies • 2 stage decoding: translation lattice generation, best path search • Recombination and pruning • N-best list generation
Word Alignment Models • We want to learn how to translate words and phrases • Can learn it from parallel corpora • Typically work with sentence aligned corpora • Available from LDC, ELRA, etc • For specific applications new data collection required • Model the associations between the different languages • Word to word mapping -> lexicon • Differences in word order -> distortion model • ‘Wordiness’, i.e. how many words to express a concept -> fertility • Statistical translation is based on word alignment models
Alignment Example Observations: • Often 1-1 • Often monotone • Some 1-to-many • Some 1-to-nothing • Not always clear-cut
Word Alignment Models • IBM1 – lexical probabilities only • IBM2 – lexicon plus absolute position • IBM3 – plus fertilities • IBM4 – „inverted“ position alignment • IBM5 – non-deficient version of model 4 • HMM – lexicon plus relative position • BiBr – Bilingual Bracketing, lexical probabilites plus reordering via parallel segmentation [Brown et.al. 1993, Vogel et.al. 1996, Och et al 1999, Wu 1997]
Notation • Target language • e: target (English) word • I: length of target sentence • i: position in target sentence (source position) • : target sentence • Source language • f: source (French) word • J: length of source sentence • j: position in source sentence (target position) • : source sentence • Alignment: relation a mapping source to target positions • i=aj: position i of ei which is aligned to j • : whole alignment
SMT - Principle • Translate a ‘French’ stringinto an ‘English’ string • Bayes’ decision rule for translation: • Why this inversion of the translation direction? • Decomposition of dependencies: makes modeling easier • Cooperation of two knowledge sources for final decision • Note1: Noisy channel model • Note2: Alternative is direct translation with log-linear model combination.
Alignment as Hidden Variable • ‘Hidden alignments’ to capture word-to-word correspondences • Mapping A subset of [1, …, J]x[1, …, I] • Number of connections: J * I (each source word with each target word • Number of alignments: 2JI (each connection yes/no) • Summation over all alignments
Restricted Alignment • Each source word has one connection • Alignment mapping becomes function: j -> i = aj • Number of alignments is now: (I+1)J • Sum over all probabilities: • Not possible to enumerate • In some situations full summation possible through Dynamic Programming • In other situations: take only best alignment and perhaps some alignments close to the best one
Translation Model • Sum over all alignment • 3 probability distributions: • Length: • Alignment: • Lexicon:
Model Assumptions Decompose interaction into pairwise dependencies • Length: Source length only dependent on target length (very weak) • Alignment: • Zero order model: target position only dependent on source position • First order model: target position only dependent on previous target position • Lexicon: source word only dependent on aligned word
Mixture Model • Interpretation as mixture model by direct decomposition
IBM1 Model • Assume uniform probability for position alignment • Alignment probability • In training: collect counts for word pairs
Overview • Statistical Machine Translation – A 5min Intro • Word Alignment Models – Another 5min Intro • Phrase Translation – A somewhat longer Intro • Different approaches • PESA: Phrase Pair Extraction as Sentence Splitting Algorithm • Scoring of phrase translations • Decoding – What this talk is about • Decoding strategies • 2 stage decoding: translation lattice generation, best path search • Recombination and pruning • N-best list generation
Alignment Example • One Chinese word aligned to multi-word English phrase • In lexicon individual entries with ‘the’, ‘development’, ‘of’ • Difficult to generate from words • Main translation ‘development’ • Test if insertions of ‘the’ and ‘of’ improves LM probability • Easier to generate if we have phrase pairs available
Why Phrase to Phrase Translation • Captures n x m alignments • Encapsulates context • Local reordering • Compensates segmentation errors
How to get Phrase Translation • Typically: Train word alignment model and extract phrase-to-phrase translations from Viterbi path • IBM model 4 alignment • HMM alignment • Bilingual Bracketing • Genuine phrase translation models • Integrated segmentation and alignment (ISA) • Phase Pair Extraction via full Sentence Alignment • Notes: • Often better results when training target to source for extraction of phrase translations due to asymmetry of alignment models • Phrases are not fully integrated into the alignment model, they are extracted only after training is completed – how to assign probabilities?
Phrase Pairs from Viterbi Path • Train your favorite word alignment (IBMn, HMM, …) • Calculate Viterbi path (i.e. path with highest probability or best score) • The details ….
Word Alignment Matrix • Alignment probabilities according to lexicon eI e1 f1 fJ
Viterbi Path • Calculate Viterbi path (i.e. path with highest probability or best score) eI e1 f1 fJ
Phrases from Viterbi Path • Read off source phrase – target phrase pairs eI e1 f1 fJ
Extraction of Phrases foreach source phrase length l { foreach start position j1 = 1 … J – l { foreach end position j2 = j1 + l – 1 { min_i = min{ a(j) : j = j1 … j2 } max_i = max{ a(j) : j = j1 … j2 } SourcePhrase = fj1 … fj2 TargetPhrase = emin_i … emax_i store SourcePhrase ‘#’ TargetPhrase } } } • Training in both directions and combine phrase pairs • Calculate probabilities • Pruning: take only n-best translations for each source phrase
Dealing with Asymmetry • Word alignment models are asymmetric; Viterbi path has: • multiple source words – one target word alignments • but no one source word – multiple target words alignments • Train alignment model also in reverse direction, i.e. target -> source • Using both Viterbi paths: • Simple: extract phrases from both directions and merge tables • ‘Merge’ Viterbi paths and extract phrase pairs according to resulting pattern
Combine Viterbi Path eI F->E E->F Intersect. e1 f1 fJ
Combine Viterbi Paths • Intersections: high precision, but low recall • Union: lower precision, but higher recall • Refined: start from intersection and fill gaps according to points in union • Different heuristics have been used • Och • Koehn • Quality of phrase translation pairs depends on: • Quality of word alignment • Quality of combination of Viterbi paths
Number of (Source) Phrases • Small corpus: 40k sentences with 400k words • Number of phrases quickly exceeds number of words in corpus • Numbers are for source phrases only; each phrase typically has multiple translations (factor 5 – 20)
Dealing with Memory Limitation • Phrase translation tables are memory killers • Number of phrases quickly exceeds number of words in corpus • Memory required is multiple of memory for corpus • We have corpora of 200 million words -> >1 billion source phrases> 10 billion phrase pairs • Restrict phrases • Only take short ones • Only take frequent ones • Evaluation modus • Load only phrases required for test sentences (i.e. extract from large phrase translation table) • Extract and store only required phrase pairs (i.e. part of training cycle at evaluation time)
New Phrase Alignment • Desiderata: • Use phrases up to any lengthCan not store all phrase pairs -> search them on the fly • High quality translation pairs • Balance with word-based translation
Phrase Alignment – New Approach • Search translation for one source phrase eI e1 f1 fj1 fj2 fJ
Phrase Alignment • What we would like to find eI ei2 ei1 e1 f1 fj1 fj2 fJ
Phrase-Pair Extraction via Sentence Alignment • Calculate modified IBM1 word alignment: don’t sum over words in ‘forbidden’ (grey) areas • Select target phrase boundaries which maximize sentence alignment probability • Modify boundaries i1 and i2 • Calculate sentence alignment • Take best i2 i1 j1 j2
Phrase Alignment • Search for optimal boundaries eI e1 f1 fj1 fj2 fJ
Phrase Alignment • Search for optimal boundaries eI e1 f1 fj1 fj2 fJ
Phrase Alignment • Search for optimal boundaries eI e1 f1 fj1 fj2 fJ
Phrase Alignment • Search for optimal boundaries eI e1 f1 fj1 fj2 fJ
Phrase Alignment • Search for optimal boundaries eI e1 f1 fj1 fj2 fJ
Phrase Alignment • Search for optimal boundaries eI e1 f1 fj1 fj2 fJ
Phrase Alignment • Search for optimal boundaries eI e1 f1 fj1 fj2 fJ