230 likes | 372 Views
Machine Translation Word Alignment. Stephan Vogel Spring Semester 2011. Overview. IBM 3: Fertility IBM 4: Relative Distortion Acknowledgement: These slides are based on slides by Hermann Ney and Franz Josef Och. Fertility Models. Basic concept: each word in one language can generate
E N D
Machine TranslationWord Alignment Stephan Vogel Spring Semester 2011 Stephan Vogel - Machine Translation
Overview • IBM 3: Fertility • IBM 4: Relative Distortion Acknowledgement: These slides are based on slides by Hermann Ney and Franz Josef Och Stephan Vogel - Machine Translation
Fertility Models • Basic concept: each word in one language can generate multiple words in the other language deseo – I would like übermorgen – the day after tomorrow departed – fuhr ab • The same word can generate different number of words -> probability distribution F • Alignment is function -> fertility only on one side • In my terminology: target words have fertility, i.e. each target word can cover multiple source words • Others say source word generates multiple target words • Some source words are aligned to NULL word, i.e. NULL word has fertility • Many target words are not aligned, i.e. have fertility 0 Stephan Vogel - Machine Translation
The Generative Story e0 e1 e2 e3 e4 e5 fertility generation 1 2 0 1 3 0 word generation f01 f11 f12 f31 f41 f42 f43 permutation generation f1 f2 f3 f4 f5 f6 f7 Stephan Vogel - Machine Translation
Fertility Model Alignment model: Select fertility for each English word: For each English word select a tablet of French words: Select a permutation for the entire sequence of French words: Sum over all realizations: Stephan Vogel - Machine Translation
Fertility Model: Constraints Fertility bound to alignment: Permutation: French words: Stephan Vogel - Machine Translation
Fertility Model Decomposition into factors: Apply chain rule to each factor, limit dependencies: Fertility generation (IBM 3,4,5): Word generation (IBM 3,4,5): Permutation generation (only IBM 3): Note: 1/F0! results from special model for i = 0. Stephan Vogel - Machine Translation
Fertility Model: Some Issues • Permutation model can not guaranty that p is a permutation -> Words ca be stacked on top of each other -> This leads to deficiency • Position i = 0 is not a real position -> special alignment and fertility model for the empty word Stephan Vogel - Machine Translation
Fertility Model: Empty Position • Alignment assumptions for the empty position i = 0 • Uniform position distribution for each of the F0 French words generated from e0 • Place these French words only after all other words have been placed • Alignment model for the positions aligned to the Empty position: • One position: • All positions: Stephan Vogel - Machine Translation
Fertility Model: Empty Position • Fertility model for words generated by e0, i.e. by empty position • We assume that each word from f1J requires the Empty word withprobability [1 – p0] • Probability that exactly F0from the J words in f1J require the Empty word: Stephan Vogel - Machine Translation
Deficiency • Distortion model for real words is deficient • Distortion model for empty word is non-deficient • Deficiency can be reduced by aligning more words to the empty word • Training corpus likelihood can be increased by aligning more words with empty word • Play with p0! Stephan Vogel - Machine Translation
IBM 4: 1st Order Distortion Model • Introduce more detailed dependencies into the alignment (permutation) model • First order dependency along e-axis HMM IBM4 Stephan Vogel - Machine Translation
Inverted Alignment • Consider alignments • Dependency along I axis: jumps along the J axis • Two first order models for aligning first word in a set and for aligning remaining words • We skip the math :-) Stephan Vogel - Machine Translation
Characteristics of Alignment Models Stephan Vogel - Machine Translation
Consideration: Overfitting • Training on data has always the danger of overfitting • Model describes training data in too much detail • But does not perform well on unseen test data • Solution: Smoothing • Lexicon: distribute some of the probability mass from seen events to unseen events • for p( f | e ), do this for each e) • For unseen e: uniform distribution or ??? • Distortion: interpolate with uniform distribution • Fertility: for many languages ‘longer word’ = ‘more content’ • E.g. compounds or agglutinative morphology • Train a model for fertility given word length and interpolate with • Interpolate fertility estimates based on word frequency: frequent word, use the word model, low frequency word bias towards the length model Stephan Vogel - Machine Translation
Extension: Using Manual Dictionaries • Adding manual dictionaries • Simple method 1: add as bilingual data • Simple method 2: interpolate manual with trained dictionary • Use constraint GIZA (Gao, Nguyen, Vogel, WMT 2010) • Can put higher weight on word pairs from dictionary (Och, ACL 2000) • Not so simple: “But dictionaries are data too” (Brown et al, HLT 93) • Problem: manual dictionaries do not have inflected form • Possible Solution: • Generate additional word forms (Vogel and Monson, LREC 04) Stephan Vogel - Machine Translation
Extension: Using POS • Use POS in distortion model • We had: • Now we condition of word class of previous aligned target • Available in GIZA++ • Automatic clustering of vocabulary into word classes with mkcls • Default: 50 classes • Use POS as 2nd ‘Lexicon’ model (e.g. Zhao et al, ACL 2005) • Train p( C(f) | C(d ), start with initial model trained with IBM1 just on word classes • Align sentence pairs using p( C(f) | C(d ) and p( f | e ) • Update both distributions from Viterbi path Stephan Vogel - Machine Translation
And Much More … • Add fertilities to HMM model • Symmetrize during training: i.e. update lexicon probabilities based on symmetrized alignment • Benefit from shorter sentence pairs • Split long sentences based on initial alignment and retrain • Extract phrase pairs and add reliable ones to training data • And then all the work on discriminative word alignment Stephan Vogel - Machine Translation
Alignment Results • Unbalanced between wrong and missing -> unbalanced between precision and recall • Chinese is harder, many missing links -> low precision • One direction seems harder: related to which side has more words • Alignment models generate one link per source word Stephan Vogel - Machine Translation
Unaligned Words • NULL Alignment explicit, part of the model; non-aligned happens • This is serious: alignment model neglects 1/3 of target words • Alignment is very asymmetric, therefore combination Stephan Vogel - Machine Translation
Alignment Errors for Most Frequent Words (CH-EN) Stephan Vogel - Machine Translation
Sentence Length Distribution • Sentences are often unbalanced • Wrong sentence alignment • Bad translations • But also language divergences • May wanna remove unbalance sentences • Sentence length model very weak Table: Target sentence length distribution for source sentence length 10 Stephan Vogel - Machine Translation
Summary • Word Alignment Models • Alignment is (mathematically) a function, i.e many source words to 1 target word, but not the other way round • Symmetry by training in both directions • Model IBM1 • word-word probabilities • Simple training with Expectation-Maximization • Model IBM2 • Position alignment • Training also with EM • Model HMM • Relative positions (first order model) • Training with Viterbi or Forward-Backward Algorithm • Alignment errors reflect restrictions in generative alignment models Stephan Vogel - Machine Translation