340 likes | 468 Views
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT). Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011. Language Divergence Theory: Lexico -Semantic Divergences (ref: Dave, Parikh, Bhattacharyya, Journal of MT, 2002).
E N D
CS460/626 : Natural Language Processing/Speech, NLP and the Web(Lecture 17– Alignment in SMT) Pushpak BhattacharyyaCSE Dept., IIT Bombay 14th Feb, 2011
Language Divergence Theory: Lexico-Semantic Divergences (ref: Dave, Parikh, Bhattacharyya, Journal of MT, 2002) • Conflational divergence • F: vomir; E: to be sick • E: stab; H: churaa se maaranaa (knife-with hit) • S: Utrymningsplan; E: escape plan • Structural divergence • E: SVO; H: SOV • Categorial divergence • Change is in POS category (many examples discussed) • Head swapping divergence • E: Prime Minister of India; H: bhaaratkepradhaanmantrii (India-of Prime Minister) • Lexical divergence • E: advise; H: paraamarshdenaa (advice give): Noun Incorporation- very common Indian Language Phenomenon
Language Divergence Theory: Syntactic Divergences • Constituent Order divergence • E: Singh, the PM of India, will address the nation today; H: bhaaratkepradhaanmantrii, singh, … (India-of PM, Singh…) • Adjunction Divergence • E: She will visit here in the summer; H: vahyahaagarmiimeMaayegii (she here summer-in will come) • Preposition-Stranding divergence • E: Who do you want to go with?; H: kisakesaathaapjaanaachaahate ho? (who with…) • Null Subject Divergence • E: I will go; H: jaauMgaa (subject dropped) • Pleonastic Divergence • E: It is raining; H: baarish ho rahiihaai (rain happening is: no translation of it)
Alignment • Completely aligned • Your answer is right • Votre response est just • Problematic alignment • We first met in Paris • Nous nous sommes rencontres pour la premiere fois a Paris
The Statistical MT model: notation • Source language: F • Target Language: E • Source language sentence: f • Target language sentence: e • Source language word: wf • Target language word: we
The Statistical MT model To translate f: • Assume that all sentences in E are translations of f with some probability! • Choose the translation with the highest probability
SMT Model • What is a good translation? • Faithful to source • Fluent in target faithfulness fluency
Language Modeling • Task to find P(e) (assigning probabilities to sentences)
Language Modeling: The N-gram approximation • Probability of the word given the previous N-1 words • N=2: bigram approximation • N=3: trigram approximation • Bigram approximation:
Translation Modeling • Task: to find P(f|e) • Cannot use the counts of f and e • Approximate: P(f|e) using the product of word translation probabilities (IBM model 1) Problem: How to calculate word translation probabilities? Note: We do not have counts – training corpus is sentence-aligned, not word-aligned
Word-alignment example (1) (2) (3) (4) Ram has an apple रामके पासएकसेबहै (1) (2)(3) (4) (5) (6)
Expectation-Maximization algorithm • Start with uniform word translation probabilities • Use these probabilities to find the counts (fractional) • Use these new counts to recompute the word translation probabilities • Repeat the above steps till values converge Works because of the co-occurrence of words that are actually translations It can be proven that EM converges
The counts in IBM Model 1 Works by maximizing P(f|e) over the entire corpus For IBM Model 1, we get the following relationship:
English-French example of alignment • Completely aligned • Your1 answer2 is3 right4 • Votre1 response2 est3 just4 • Alignment: 11, 22, 33, 44 • Problematic alignment • We1 first2 met3 in4 Paris5 • Nous1 nous2 sommes3 rencontres4 pour5 la6 premiere7 fois8 a9 Paris10 • Alignment: 1(1,2) , 2(5,6,7,8) , 34 , 49 , 510 • Fertilty?: yes
English three rabits a b rabbits of Grenoble bc d French troislapins x y (2) lapins de Grenoble xy z EM for word alignment from sentence alignment: example
Initial Probabilities: each cell denotes t(a w), t(a x) etc.
The counts in IBM Model 1 Works by maximizing P(f|e) over the entire corpus For IBM Model 1, we get the following relationship:
Example of expected count C[aw; (a b)(w x)] t(aw) = ------------------------- X #(a in ‘a b’) X #(w in ‘w x’) t(aw)+t(ax) 1/4 = ----------------- X 1 X 1= 1/2 1/4+1/4
Revised probability: example trevised(a w) 1/2 = ---------------------------------------- (½+1/2 +0+0 )(a b)( w x) +(0+0+0+0 )(b c d) (x y z)
Re-Revised probabilities table Continue until convergence; notice that (b,x) binding gets progressively stronger
Another Example A four-sentence corpus: a b ↔ x y (illustrated book ↔ livreillustrie) b c ↔ x z (book shop ↔ livremagasin) Assuming no null alignments. Possible alignments: a b a b b c b c x y x y x z x z
Translation Model: Exact expression • Five models for estimating parameters in the expression [2] • Model-1, Model-2, Model-3, Model-4, Model-5 Choose the length of foreign language string given e Choose alignment given e and m Choose the identity of foreign word given e, m, a
Proof of Translation Model: Exact expression ; marginalization ; marginalization m is fixed for a particular f, hence
Model-1 • Simplest model • Assumptions • Pr(m|e) is independent of m and e and is equal to ε • Alignment of foreign language words (FLWs) depends only on length of English sentence = (l+1)-1 • l is the length of English sentence • The likelihood function will be • Maximize the likelihood function constrained to
Model-1: Parameter estimation • Using Lagrange multiplier for constrained maximization, the solution for model-1 parameters • λe : normalization constant; c(f|e; f,e) expected count;δ(f,fj) is 1 if f & fj are same, zero otherwise. • Estimate t(f|e) using Expectation Maximization (EM) procedure