1 / 47

Statistical Machine Translation

Statistical Machine Translation. Mohammad Taher Pilevar University of Tehran Winter 2010. Machine Translation?.

Download Presentation

Statistical Machine Translation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Machine Translation Mohammad Taher Pilevar University of Tehran Winter 2010

  2. Machine Translation? The U.S. island of Guam is maintaining a high state of alert after the Guam airport and its offices both received an e-mail from someone calling himself the Saudi Arabian Osama bin Laden and threatening a biological/chemical attack against public places such as the airport. جزیره گوام ایالات متحده، پس از اینکه فرودگاه گوام و دفاتردولتی آن ایمیلی تهدیدآمیز در مورد حمله زیستی/شیمیایی به اماکن عمومی از جمله فرودگاه بود را از طرف شخصی که خود را اسامه بن لادن می نامید در یافت کردند، به حالت آماده باش کامل در آمده است.

  3. Translation? • (sometimes) impossible for a sentence in one language to be a translation of a sentence in other (strictly speaking) • Eg: the Lord is my shepherd • the Lord will look after me (cost of fidelity) • The Lord is for me like somebody who looks after animals with cotton-like hair (faithful to original sent.) compromise

  4. Fidelity + Fluency • So, true translation, which is both • Faithful to the source language and • Fluency in the target language

  5. …Goal of Translation • the production of an output that maximizes some value function that represents the importance of bothfaithfulness and fluency.  StatisticalMT

  6. best-translation = argmax faithfulness (T,S)  fluency(T) • E = argmaxP(F|E)  P(E) T E Translation model Language model

  7. Noisy channel

  8. …so, what we need? • Language model P(E) • Translation model P(E|F) • Decoder given F, produces the most probable E

  9. Language model: P(E) • Assigns a higher probability to fluent / grammatical sentences • Estimated using monolingual corpora این جمله از نظر دستور زبان فارسی، یک جمله صحیح محسوب می شود High P(e) این محسوب جمله از دستور زبان نظر می شود فارسی یک جمله صحیح Low P(e)

  10. P(F|E): THE PHRASE-BASED TRANSLATION MODEL • The job of the translation model: given an English sentence E and a foreign sentence F, is to assign a probability that E generates F.

  11. Translation model: P(e|f) • Assigns higher probability to sentences that have corresponding meaning • Estimated using bilingual corpora ? Former president had a speech رئیس جمهور سابق، سخنرانی کرد High P(e|f) Low P(e|f) در سخنرانی رئیس جمهور شرکت کردم

  12. Raw data to Bilingual corpus Some books, websites, … In English Same books, websites, … In Persian

  13. Bilingual corpus

  14. Breaking sentences into words • The Poor don’t have any money [The] [Poor] [don’t] [have] [any] [money] {انسان های} {فقیر} {هیچ} {پولی} {ندارند} • Align according to co-occurence

  15. Some examples Spurious words

  16. Many-to-many

  17. Alignments(eg. 1) The poor don’t have any money انسانهای فقیر هیچ پولی ندارند [The poor] [don’t have] [any money] [The poor] [don’t have any money]

  18. Alignments (eg. 2) He forgot to turn off the stove او فراموش کرد که گاز خاموش کند [He forgot to] [turn off] [the stove]

  19. IBM Model 1

  20. P(F,A|E) Story null The quick fox jumps over the lazy dog Le renard rapide saut par - dessus le chien parasseux

  21. P(F,A|E) Story null The quick fox jumps over the lazy dog Le renard rapide saut par - dessus le chien parasseux

  22. null The quick fox jumps over the lazy dog null The quick fox jumps over the lazy dog null The quick fox jumps over the lazy dog null The quick fox jumps over the lazy dog null The quick fox jumps over the lazy dog null The quick fox jumps over the lazy dog Le renard rapide saut par - dessus le chien parasseux Le renard rapide saut par - dessus le chien parasseux Le renard rapide saut par - dessus le chien parasseux Le renard rapide saut par - dessus le chien parasseux Le renard rapide saut par - dessus le chien parasseux Le renard rapide saut par - dessus le chien parasseux Getting Pt(f|e) • We need numbers for Pt(f|e) • Example: Pt(le|the) • Count lines in a large collection of aligned text

  23. Where’s “heaven” in Vietnamese? English: In the beginning God created the heavens and the earth. Vietnamese: Ban dâu Dúc Chúa Tròi dung nên tròi dât. English: God called the expanse heaven. Vietnamese: Dúc Chúa Tròi dat tên khoang không la tròi. English: … you are this day like the stars of heaven in number. Vietnamese: … các nguoi dông nhu sao trên tròi. Example borrowed from Jason Eisner

  24. Where’s “heaven” in Vietnamese? English: In the beginning God created the heavens and the earth. Vietnamese: Ban dâu Dúc Chúa Tròi dung nên tròi dât. English: God called the expanse heaven. Vietnamese: Dúc Chúa Tròi dat tên khoang không la tròi. English: … you are this day like the stars of heaven in number. Vietnamese: … các nguoi dông nhu sao trên tròi. Example borrowed from Jason Eisner

  25. EM: Estimation Maximization • Assume a probability distribution (weights) over hidden events • Take counts of events based on this distribution • Use counts to estimate new parameters • Use parameters to re-weight examples. • Rinse and repeat

  26. Good grief! We forgot about P(F|E)! • No worries, a little more stats gets us what we need:

  27. Big Example: Corpus 1 fast car voiture rapide 2 fast rapide

  28. Possible Alignments 1a 1b 2 fast car fast fast car voiture rapide voiture rapide rapide

  29. Parameters 1a 1b 2 fast car fast fast car voiture rapide voiture rapide rapide

  30. Weight Calculations 1a 1b 2 fast car fast fast car voiture rapide voiture rapide rapide

  31. Count Lines 1a 1b 2 fast car fast fast car voiture rapide voiture rapide rapide 1/2 1/2 1

  32. Count Lines 1a 1b 2 fast car fast fast car voiture rapide voiture rapide rapide 1/2 1/2 1

  33. Count Lines 1a 1b 2 fast car fast fast car voiture rapide voiture rapide rapide 1/2 1/2 1 Normalize

  34. Parameters 1a 1b 2 fast car fast fast car voiture rapide voiture rapide rapide 1/2 1/2 1

  35. Weight Calculations 1a 1b 2 fast car fast fast car voiture rapide voiture rapide rapide 1/2 1/2 1

  36. Count Lines 1a 1b 2 fast car fast fast car voiture rapide voiture rapide rapide 1/4 3/4 1

  37. Count Lines 1a 1b 2 fast car fast fast car voiture rapide voiture rapide rapide 1/4 3/4 1

  38. Count Lines 1a 1b 2 fast car fast fast car voiture rapide voiture rapide rapide 1/4 3/4 1 Normalize

  39. After many iterations: 1a 1b 2 fast car fast fast car voiture rapide voiture rapide rapide ~0 ~1 1

  40. Decoding

  41. Mary di´o una bofetada a la bruja verde lattice of possible English translations for words and phrases in a particular sentence F,

  42. Mary di´o una bofetada a la bruja verde

  43. Generative story… • we group the English source words into phrases • Translate them • Optionally reorder Translation probability Distortion probability

  44. translation probability: • Words having ‘distorted’ position in the Spanish sentence than it had in the English sentence: • where is the start position of the foreign phrase generated by the English phrase , and is the end position of the foreign phrase generated by the English phrase .

  45. Distortion probability 1 1 2 This distortion model penalizes large distortions by giving lower and lower probability the larger the distortion

  46. Final translation model for phrase-based MT is:

  47. Alignment in MT • The Poor don’t have any money [The] [Poor] [don’t] [have] [any] [money] {انسان های} {فقیر} {هیچ} {پولی} {ندارند} • Align according to co-occurence

More Related