1 / 29

Machine Translation and MT tools: Giza++ and Moses

Machine Translation and MT tools: Giza++ and Moses. -Nirdesh Chauhan. Outline. Problem statement in SMT Translation models Using Giza++ and Moses. Introduction to SMT. Given a sentence in foreign language F, find most appropriate translation in English E P(F|E) – Translation model

slade
Download Presentation

Machine Translation and MT tools: Giza++ and Moses

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Translation and MT tools: Giza++ and Moses -Nirdesh Chauhan

  2. Outline • Problem statement in SMT • Translation models • Using Giza++ and Moses

  3. Introduction to SMT • Given a sentence in foreign language F, find most appropriate translation in English E • P(F|E) – Translation model • P(E) – Language model

  4. The Generation Process • Partition: Think of all possible partitions of the source language • Lexicalization: For a give partition, translate each phrase into the foreign language • Reordering: permute the set of all foreign words - words possibly moving across phrase boundaries • We need the notion of alignment to better explain mathematic behind the generation process

  5. Alignment

  6. Word-based alignment • For each word in source language, align words from target language that this word possibly produces • Based on IBM models 1-5 • Model 1 – simplest • As we go from models 1 to 5, models get more complex but more realistic • This is all that Giza++ does

  7. Alignment A function from target position to source position: The alignment sequence is: 2,3,4,5,6,6,6 Alignment function A: A(1) = 2, A(2) = 3 .. A different alignment function will give the sequence:1,2,1,2,3,4,3,4 for A(1), A(2).. To allow spurious insertion, allow alignment with word 0 (NULL) No. of possible alignments: (I+1)J

  8. IBM Model 1: Generative Process

  9. IBM Model 1: Details • No assumptions. Above formula is exact. • Choosing length: P(J|E) = P(J|E,I) = P(J|I) = • Choosing Alignment: all alignments equiprobable • Translation Probability

  10. Training Alignment Models • Given a parallel corpora, for each (F,E) learn the best alignment A and the component probabilities: • t(f|e) for Model 1 • lexicon probability P(f|e) and alignment probability P(ai|ai-1,I) • How to compute these probabilities if all you have is a parallel corpora

  11. Intuition : Interdependence of Probabilities • If you knew which words are probable translation of each other then you can guess which alignment is probable and which one is improbable • If you were given alignments with probabilities then you can compute translation probabilities • Looks like a chicken and egg problem • EM algorithm comes to the rescue

  12. Expectation Maximization (EM) Algorithm • Used when we want maximum likelihood estimate of the parameters of a model when the model depends on hidden variables • In present case, parameters are Translation Probabilities, and hidden Variables are alignment probabilities • Init: Start with an arbitrary estimate of parameters • E-step: compute the expected value of hidden variables • M-Step: Recompute the parameters that maximize the likelihood of data given the expected value of the hidden variables from E-step

  13. Example of EM Algorithm Green house Casa verde The house La case Init: Assume that any word can generate any word with equal prob: P(la|house) = 1/3

  14. E-Step E-Step:

  15. M-Step

  16. E-Step again 1/3 2/3 2/3 1/3 Repeat till convergence

  17. Limitation: Only 1->Many Alignments allowed

  18. Phrase-based alignment • More natural • Many-to-one mappings allowed

  19. Generating Bi-directional Alignments • Existing models only generate uni-directional alignments • Combine two uni-directional alignments to get many-to-many bi-directional alignments

  20. Hindi-Eng Alignment

  21. Eng-Hindi Alignment

  22. Combining Alignments P=4/5=.8,R=4/7=.6 P=2/3=.67, R=2/7=.3 P=5/6=.83,R=5/7=.7 P=6/9=.67,R=6/7=.85

  23. A Different Heuristic from Moses-Site GROW-DIAG-FINAL(e2f,f2e): neighboring = ((-1,0),(0,-1),(1,0),(0,1),(-1,-1),(-1,1),(1,-1),(1,1)) alignment = intersect(e2f,f2e); GROW-DIAG(); FINAL(e2f); FINAL(f2e); GROW-DIAG(): iterate until no new points added for english word e = 0 ... en for foreign word f = 0 ... fn if ( e aligned with f ) for each neighboring point ( e-new, f-new ): if (( e-new, f-new ) in union( e2f, f2e ) and ( e-new not aligned and f-new not aligned )) add alignment point ( e-new, f-new ) FINAL(a): for english word e-new = 0 ... en for foreign word f-new = 0 ... fn if ( ( ( e-new, f-new ) in alignment a) and ( e-new not aligned or f-new not aligned ) ) add alignment point ( e-new, f-new ) Proposed Changes: After growing diagonal Align the shorter sentence first And use alignments only from corresponding directional alignment

  24. Generating Phrase Alignments premier beach vacation प्रमुखसमुद्र-तटीय a premier beach vacation destination एकप्रमुख समुद्र-तटीयगंतव्य है

  25. Using Moses and Giza++ • Refer to http://www.statmt.org/moses_steps.html

  26. Steps • Install all packages in Moses Input - sentence aligned parallel corpus • Training • Tuning • Generate output on test corpus (decoding)

  27. Example • train.pr hh eh l ow hh ah l ow w er l d k aa m p aw n d w er d hh ay f ah n ey t ih d ow eh n iy b uw m k w iy z l ah b aa t ah r • train.en h e l l o h e l l o w o r l d c o m p o u n d w o r d h y p h e n a t e d o n e b o o m k w e e z l e b o t t e r

  28. Sample from Phrase-table b o ||| b aa ||| (0) (1) ||| (0) (1) ||| 1 0.666667 1 0.181818 2.718 b ||| b ||| (0) ||| (0) ||| 1 1 1 1 2.718 c o m p o ||| aa m p ||| (2) (0,1) (1) (0) (1) ||| (1,3) (1,2,4) (0) ||| 1 0.0486111 1 0.154959 2.718 c ||| p ||| (0) ||| (0) ||| 1 1 1 1 2.718 d w ||| d w ||| (0) (1) ||| (0) (1) ||| 1 0.75 1 1 2.718 d ||| d ||| (0) ||| (0) ||| 1 1 1 1 2.718 e b ||| ah b ||| (0) (1) ||| (0) (1) ||| 1 1 1 0.6 2.718 e l l ||| ah l ||| (0) (1) (1) ||| (0) (1,2) ||| 1 1 0.5 0.5 2.718 e l l ||| eh l ||| (0) (0) (1) ||| (0,1) (2) ||| 1 0.111111 0.5 0.111111 2.718 e l ||| eh ||| (0) (0) ||| (0,1) ||| 1 0.111111 1 0.133333 2.718 e ||| ah ||| (0) ||| (0) ||| 1 1 0.666667 0.6 2.718 h e ||| hh ah ||| (0) (1) ||| (0) (1) ||| 1 1 1 0.6 2.718 h ||| hh ||| (0) ||| (0) ||| 1 1 1 1 2.718 l e b ||| l ah b ||| (0) (1) (2) ||| (0) (1) (2) ||| 1 1 1 0.5 2.718 l e ||| l ah ||| (0) (1) ||| (0) (1) ||| 1 1 1 0.5 2.718 l l o ||| l ow ||| (0) (0) (1) ||| (0,1) (2) ||| 0.5 1 1 0.227273 2.718 l l ||| l ||| (0) (0) ||| (0,1) ||| 0.25 1 1 0.833333 2.718 l o ||| l ow ||| (0) (1) ||| (0) (1) ||| 0.5 1 1 0.227273 2.718 l ||| l ||| (0) ||| (0) ||| 0.75 1 1 0.833333 2.718 m ||| m ||| (0) ||| (0) ||| 1 0.5 1 1 2.718 n d ||| n d ||| (0) (1) ||| (0) (1) ||| 1 1 1 1 2.718 n e ||| eh n iy ||| (1) (2) ||| () (0) (1) ||| 1 1 0.5 0.3 2.718 n e ||| n iy ||| (0) (1) ||| (0) (1) ||| 1 1 0.5 0.3 2.718 n ||| eh n ||| (1) ||| () (0) ||| 1 1 0.25 1 2.718 o o m ||| uw m ||| (0) (0) (1) ||| (0,1) (2) ||| 1 0.5 1 0.181818 2.718 o o ||| uw ||| (0) (0) ||| (0,1) ||| 1 1 1 0.181818 2.718 o ||| aa ||| (0) ||| (0) ||| 1 0.666667 0.2 0.181818 2.718 o ||| ow eh ||| (0) ||| (0) () ||| 1 1 0.2 0.272727 2.718 o ||| ow ||| (0) ||| (0) ||| 1 1 0.6 0.272727 2.718 w o r ||| w er ||| (0) (1) (1) ||| (0) (1,2) ||| 1 0.1875 1 0.424242 2.718 w ||| w ||| (0) ||| (0) ||| 1 0.75 1 1 2.718

  29. Testing output • h o t  hh aa t • p h o n e  p|UNK hh ow eh n iy • b o o k  b uw k

More Related