1 / 82

Statistical Machine Translation

Statistical Machine Translation. Alona Fyshe. Based on slides from Colin Cherry and Dekang Lin. Basic statistics. 0 <= P(x) <=1 P(A) Probability that A happens P(A,B) Probabiliy that A and B happen P(A|B) Probability that A happens given that we know B happened. Basic statistics.

johana
Download Presentation

Statistical Machine Translation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

  2. Basic statistics • 0 <= P(x) <=1 • P(A) • Probability that A happens • P(A,B) • Probabiliy that A and B happen • P(A|B) • Probability that A happens given that we know B happened

  3. Basic statistics • Conditional probability

  4. Basic Statistics • Use definition of conditional probability to derive the chain rule

  5. Basic Statistics • Bayes Rule

  6. Basic Statistics • Just remember • Definition of cond. prob. • Bayes rule • Chain rule

  7. Goal • Translate. • I’ll use French (F) into English (E) as the running example.

  8. Oh, Canada • I’m Canadian • Mandatory French class in school until grade 6 • I speak “Cereal Box French” Gratuit Gagner Chocolat Glaçage Sans gras Sans cholestérol Élevé dans la fibre

  9. Oh, Canada

  10. Machine Translation • Translation is easy for (bilingual) people • Process: • Read the text in French • Understand it • Write it down in English

  11. Machine Translation • Translation is easy for (bilingual) people • Process: • Read the text in French • Understand it • Write it down in English

  12. Machine Translation Understanding language Writing well formed text • Hard tasks for computers • The human process is invisible, intangible

  13. One approach: Babelfish • A rule-based approach to machine translation • A 30-year-old feat in Software Eng. • Programming knowledge in by hand is difficult and expensive

  14. Alternate Approach: Statistics • We are trying to model P(E|F) • I give you a French sentence • You give me back English • How are we going to model this? • We could use Bayes rule:

  15. Alternate Approach: Statistics

  16. Why Bayes rule at all? • Why not model P(E|F) directly? • P(F|E)P(E) decomposition allows us to be sloppy • P(E) worries about good English • P(F|E) worries about French that matches English • The two can be trained independently

  17. Crime Scene Analogy • F is a crime scene. E is a person who may have committed the crime • P(E|F) - look at the scene - who did it? • P(E) - who had a motive? (Profiler) • P(F|E) - could they have done it? (CSI - transportation, access to weapons, alabi) • Some people might have great motives, but no means - you need both!

  18. On voit Jon à la télévision Table borrowed from Jason Eisner

  19. On voit Jon à la télévision Table borrowed from Jason Eisner

  20. I speak English good. • How are we going to model good English? • How do we know these sentences are not good English? • Jon appeared in TV. • It back twelve saw. • In Jon appeared TV. • TV appeared on Jon. • Je ne parle pas l'anglais.

  21. I speak English good. • Je ne parle pas l'anglais. • These aren’t English words. • It back twelve saw. • These are English words, but it’s jibberish. • Jon appeared in TV. • “appeared in TV” isn’t proper English

  22. I speak English good. • Let’s say we have a huge collection of documents written in English • Like, say, the Internet. • It would be a pretty comprehensive list of English words • Save for “named entities” People, places, things • Might include some non-English words • Speling mitsakes! lol! • Could also tell if a phrase is good English

  23. Google, is this good English? • Jon appeared in TV. • “Jon appeared” 1,800,000 Google results • “appeared in TV” 45,000 Google results • “appeared on TV” 210,000 Google results • It back twelve saw. • “twelve saw” 1,100 Google results • “It back twelve” 586 Google results • “back twelve saw” 0 Google results • Imperfect counting… why?

  24. Google, is this good English? • Language is often modeled this way • Collect statistics about the frequency of words and phrases • N-gram statistics • 1-gram = unigram • 2-gram = bigram • 3-gram = trigram • 4-gram = four-gram • 5-gram = five-gram

  25. Google, is this good English? • Seriously, you want to query google for every phrase in the translation? • Google created and released a 5-gram data set. • Now you can query Google locally • (kind of)

  26. Language Modeling • What’s P(e)? • P(English sentence) • P(e1, e2, e3 … ei) • Using the chain rule

  27. Language Modeling • Markov assumption • The choice of word ei depends only on the n words before ei • Definition of conditional probability

  28. Language Modeling

  29. Language Modeling • Approximate probability using counts • Use the n-gram corpus!

  30. Language Modeling • Use the n-gram corpus! • Not surprisingly, given that you love to eat, loving to eat chocolate is more probable (0.177)

  31. Language Modeling • But what if • Then P(e) = 0 • Happens even if the sentence is grammatically correct • “Al Gore’s pink Hummer was stolen.”

  32. Language Modeling • Smoothing • Many techniques • Add one smoothing • Add one to every count • No more zeros, no problems • Backoff • If P(e1, e2, e3, e4, e5) = 0 use something related to P(e1, e2, e3, e4)

  33. Language Modeling • Wait… Is this how people “generate” English sentences? • Do you choose your fifth word based on B • Admittedly, this is an approximation to process which is both • intangible and • hard for humans themselves to explain • If you disagree, and care to defend yourself, consider a PhD in NLP

  34. Back to Translation • Anyway, where were we? • Oh right… • So, we’ve got P(e), let’s talk P(f|e)

  35. Where will we get P(F|E)? Machine Learning Magic Cereal boxes in English Same cereal Boxes, in French P(F|E) model

  36. Where will we get P(F|E)? Machine Learning Magic Books in English Same books, in French P(F|E) model We call collections stored in two languages parallel corpora or parallel texts Want to update your system? Just add more text!

  37. Translated Corpora • The Canadian Parliamentary Debates • Available in both French and English • UN documents • Available in Arabic, Chinese, English, French, Russian and Spanish

  38. Problem: • How are we going to generalize from examples of translations? • I’ll spend the rest of this lecture telling you: • What makes a useful P(F|E) • How to obtain the statistics needed for P(F|E) from parallel texts

  39. Strategy: Generative Story • When modeling P(X|Y): • Assume you start with Y • Decompose the creation of X from Y into some number of operations • Track statistics of individual operations • For a new example X,Y: P(X|Y) can be calculated based on the probability of the operations needed to get X from Y

  40. The quick fox jumps over the lazy dog Le renard rapide saut par - dessus le chien parasseux What if…?

  41. New Information • Call this new info a word alignment (A) • With A, we can make a good story The quick fox jumps over the lazy dog Le renard rapide saut par - dessus le chien parasseux

  42. P(F,A|E) Story null The quick fox jumps over the lazy dog

  43. P(F,A|E) Story null The quick fox jumps over the lazy dog f1 f2 f3 … f10 Simplifying assumption: Choose the length of the French sentence f. All lengths have equal probability 

  44. P(F,A|E) Story null The quick fox jumps over the lazy dog f1 f2 f3 … f10 There are (l+1)m = (8+1)10 possible alignments

  45. P(F,A|E) Story null The quick fox jumps over the lazy dog Le renard rapide saut par - dessus le chien parasseux

  46. P(F,A|E) Story null The quick fox jumps over the lazy dog Le renard rapide saut par - dessus le chien parasseux

  47. null The quick fox jumps over the lazy dog null The quick fox jumps over the lazy dog null The quick fox jumps over the lazy dog null The quick fox jumps over the lazy dog null The quick fox jumps over the lazy dog null The quick fox jumps over the lazy dog Le renard rapide saut par - dessus le chien parasseux Le renard rapide saut par - dessus le chien parasseux Le renard rapide saut par - dessus le chien parasseux Le renard rapide saut par - dessus le chien parasseux Le renard rapide saut par - dessus le chien parasseux Le renard rapide saut par - dessus le chien parasseux Getting Pt(f|e) • We need numbers for Pt(f|e) • Example: Pt(le|the) • Count lines in a large collection of aligned text

  48. Where do we get the lines? • That sure looked like a lot of monkeys… • Remember: some times the information hidden in the text just jumps out at you • We’ll get alignments out of unaligned text by treating the alignment as a hidden variable • We infer an A that maxes the prob. of our corpus • Generalization of ideas in HMM training: called EM

  49. Where’s “heaven” in Vietnamese? Example borrowed from Jason Eisner

  50. Where’s “heaven” in Vietnamese? English: In the beginning God created the heavens and the earth. Vietnamese: Ban dâu Dúc Chúa Tròi dung nên tròi dât. English: God called the expanse heaven. Vietnamese: Dúc Chúa Tròi dat tên khoang không la tròi. English: … you are this day like the stars of heaven in number. Vietnamese: … các nguoi dông nhu sao trên tròi. Example borrowed from Jason Eisner

More Related