1 / 23

Learning to Translate

Learning to Translate. (native speakers). Source: Global Reach (www.glreach.com). (number of people online in each “language zone”, I think). Source: Global Reach (www.glreach.com). Machine Translation in the 1950’s.

meriel
Download Presentation

Learning to Translate

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning to Translate 600.465 - Intro to NLP - J. Eisner

  2. (native speakers) 600.465 - Intro to NLP - J. Eisner Source: Global Reach (www.glreach.com)

  3. (number of people online in each “language zone”, I think) 600.465 - Intro to NLP - J. Eisner Source: Global Reach (www.glreach.com)

  4. Machine Translation in the 1950’s • “We’ll have this up and running in a few years, it’ll be great, give us lots of $$$” • Oops! Foundered on word-sense disambiguation. • Nearly sank funding for all of AI. 600.465 - Intro to NLP - J. Eisner

  5. Currently available technology (L&H translator, via Japanese) At the beginning a god created Hajime for the sky and the earth. The earth is frozen as missing, formlessly, darkness was frozen as ceasing, and superficially, deeply, then a divine mind moved on a surface of water. (Babelfish translator, via Japanese) God drew up the heaven and the earth with beginning. The earth the formless and was invalid, as for the darkness there was a surface being deep, mind of God was moving to the surface of the water. 600.465 - Intro to NLP - J. Eisner

  6. Egyptian: hieroglyphs (used from 3300 BC – 400 AD) The Rosetta Stone (196 BC) Egyptian: Demotic (a late cursive script) found 1799; hieroglyphs decoded in 1822 by Champollion Greek (the language of Ptolemy V, ruler of Egypt) 6 feet tall 600.465 - Intro to NLP - J. Eisner

  7. The online Bible as Rosetta Stone English: In the beginning God created the heavens and the earth. Spanish: En el principio crió Dios los cielos y la tierra. French: Au commencement Dieu créa les cieux et la terre. Haitian: Nan konmansman, Bondye kreye syèl laak latèa. Danish: Begyndelsen skabte Gud Himmelen og Jorden. Swedish: I begynnelsen skapade Gud himmel och jord. Finnish: Alussa loi Jumala taivaan ja maan. Greek: En arch epoihsen o Qeoz ton ouranon kai thn ghn. Latin: in principio creavit Deus caelum et terram Vietnamese: Ban dâu Ðúc Chúa Tròi dung nên tròi dât. 600.465 - Intro to NLP - J. Eisner

  8. The online Bible as Rosetta Stone English:In the beginning God created the heavens and the earth. Spanish: En el principio crió Dios los cielos y la tierra. French: Au commencement Dieu créa les cieux et la terre. Haitian: Nan konmansman, Bondye kreye syèl laak latèa. Danish: Begyndelsen skabte Gud Himmelen og Jorden. Swedish: I begynnelsen skapade Gud himmel och jord. Finnish: Alussa loi Jumala taivaan ja maan. Greek: En arch epoihsen o Qeoz ton ouranon kai thn ghn. Latin: in principio creavit Deus caelum et terram Vietnamese: Ban dâu Ðúc Chúa Tròi dung nên tròi dât. 600.465 - Intro to NLP - J. Eisner

  9. The online Bible as Rosetta Stone English:In the beginning God created the heavens and the earth. Spanish: En el principio crió Dios los cielos y la tierra. French: Au commencement Dieu créa les cieux et la terre. Haitian: Nan konmansman, Bondye kreye syèl laak latèa. Danish: Begyndelsen skabte Gud Himmelen og Jorden. Swedish: I begynnelsen skapade Gud himmel och jord. Finnish: Alussa loi Jumala taivaan ja maan. Greek: En arch epoihsen o Qeoz ton ouranon kai thn ghn. Latin: in principio creavit Deus caelum et terram Vietnamese: Ban dâu Ðúc Chúa Tròi dung nên tròi dât. 600.465 - Intro to NLP - J. Eisner

  10. The online Bible as Rosetta Stone English:In the beginning God created the heavens and the earth. Spanish: En el principio crió Dios los cielos y la tierra. French: Au commencement Dieu créa les cieux et la terre. Haitian: Nan konmansman, Bondye kreye syèl laak latèa. Danish: Begyndelsen skabte Gud Himmelen og Jorden. Swedish: I begynnelsen skapade Gud himmel och jord. Finnish: Alussa loi Jumala taivaan ja maan. Greek: En arch epoihsen o Qeoz ton ouranon kai thn ghn. Latin: in principio creavit Deus caelum et terram Vietnamese: Ban dâu Ðúc Chúa Tròi dung nên tròi dât. 600.465 - Intro to NLP - J. Eisner

  11. Where’s “heaven” in Vietnamese? English: In the beginning God created the heavens and the earth. Vietnamese: Ban dâu Ðúc Chúa Tròi dung nên tròi dât. English: God called the expanse heaven. Vietnamese: Ðúc Chúa Tròi dat tên khoang không la tròi. English: … you are this day like the stars of heaven in number. Vietnamese: … các nguoi dông nhu sao trên tròi. 600.465 - Intro to NLP - J. Eisner

  12. Where’s “heaven” in Vietnamese? English: In the beginning God created the heavens and the earth. Vietnamese: Ban dâu Ðúc Chúa Tròi dung nên tròi dât. English: God called the expanse heaven. Vietnamese: Ðúc Chúa Tròi dat tên khoang không la tròi. English: … you are this day like the stars of heaven in number. Vietnamese: … các nguoi dông nhu sao trên tròi. 600.465 - Intro to NLP - J. Eisner

  13. “Created” in Vietnamese? English: In the beginning God created the heavens and the earth. Vietnamese: Ban dâu Ðúc Chúa Tròi dung nên tròi dât. English: God created the great sea monsters … Vietnamese: Ðúc Chúa Tròi dung nên các loài cá lón … English: God created man in His own image … Vietnamese: Ðúc Chúa Tròi dung nên loài nguòi nhu hình Ngài … 600.465 - Intro to NLP - J. Eisner

  14. “Created” in Vietnamese? Uh-oh English: In the beginning God created the heavens and the earth. Vietnamese: Ban dâu Ðúc Chúa Tròi dung nên tròi dât. English: God created the great sea monsters … Vietnamese: Ðúc Chúa Tròi dung nên các loài cá lón … English: God created man in His own image … Vietnamese: Ðúc Chúa Tròi dung nên loài nguòi nhu hình Ngài … 600.465 - Intro to NLP - J. Eisner

  15. “God” has a stronger claim … English: In the beginning Godcreated the heavens and the earth. Vietnamese: Ban dâu Ðúc Chúa Tròi dung nên tròi dât. English: Godcreated the great sea monsters … Vietnamese: Ðúc Chúa Tròi dung nên các loài cá lón … English: Godcreated man in His own image … Vietnamese: Ðúc Chúa Tròi dung nên loài nguòi nhu hình Ngài … 600.465 - Intro to NLP - J. Eisner

  16. … “created” makes do with rest English: In the beginning Godcreated the heavens and the earth. Vietnamese: Ban dâu Ðúc Chúa Tròidung nên tròi dât. English: Godcreated the great sea monsters … Vietnamese: Ðúc Chúa Tròidung nên các loài cá lón … English: Godcreated man in His own image … Vietnamese: Ðúc Chúa Tròidung nên loài nguòi nhu hình Ngài … 600.465 - Intro to NLP - J. Eisner

  17. What’s “bathroom” in Vietnamese? • Bible only gives you “begat,” not “bathroom” – but web is much bigger • Find bilingual web pages automatically • “Click for English / Français” • Government, tourist, commercial, tech … • Run this strategy on them automatically • Get a dictionary • Uses: multilingual search, translation aid … 600.465 - Intro to NLP - J. Eisner

  18. Competitive Linking Algorithm … nod your head … wag your tail … head of the class … swollen head … … hochez la tête … hochez la queue … en tête de la classe … bouffant d’orgeuil … Headhochez … but often paired head = tête … though not always nod = hochez … though not always • Link words that look alike or often go together. • Make a tentative French-English dictionary of linked words. • (or if such a dictionary exists already, maybe you can convince the publisher to give you the typesetting files – will work better) 600.465 - Intro to NLP - J. Eisner

  19. Competitive Linking Algorithm … nod your head … wag your tail … head of the class … swollen head … … hochez la tête … hochez la queue … en tête de la classe … bouffant d’orgeuil … Headhochez … but often paired head = tête … though not always nod = hochez … though not always • Link words that look alike or often go together. 2. Make a tentative French-English dictionary of linked words. 3. Use the dictionary to greedily guess each word’s best link. 4. Use the links to get a better dictionary. 5. Repeat! 600.465 - Intro to NLP - J. Eisner

  20. pobj S mod The urgent responseto ... mod PLACE JJ NNS VBG IN NNP NNP [ ] … . [National laws ] applying in [Hong Kong ] JJ JJ NN NN National laws applying in Hong Kong mod pobj subj mod mod PLACE JJ NNS VBG IN NNP NNP [National laws ] applying in [Hong Kong ] National laws applying in Hong Kong New Statistical Language Software [ ] [ ] IN NNP NNP VBG VBG JJ JJ JJ NNS NNS training of Hong In implementing national law(s) Kong PLACE mod subj mod Translingual Knowledge Projection and Statistical Machine Translation 24 hours! 600.465 - Intro to NLP - J. Eisner

  21. Noisy Channel Model:Chinese as Garbled English The urgent responseto … E = Given input C,software chooses Ethat maximizes p(English=E) xp(Chinese=C | English=E) New Statistical Language Software C = 600.465 - Intro to NLP - J. Eisner

  22. Latin as Garbled English E=Topmost with praise? high p(L|E) but low p(E) E=Burger with fries? high p(E) but low p(L|E) With highesthonors E = maximizes p(E)*p(L|E) New Statistical Language Software L = summa cum laude 600.465 - Intro to NLP - J. Eisner

  23. What are the models? • Source model p(E) could be trigram model • Guarantees semi-fluent English • Channel model p(C|E) or p(L|E) could be finite-state transducer • Stochastically translates each word + allows a little random rearrangement – with high prob, words stay more or less put • Maximizing p(C|E) would give really lousy Chinese translation of English • Random word translation is stupid – need word sense from context • Random word rearrangement is stupid – phrases rearrange! • This channel has no idea what fluent Chinese looks like • But maximizing p(E)*p(C|E) gives a better English translation of Chinese because p(E) knows what English should look like. • Currently trying to make these models less stupid. 600.465 - Intro to NLP - J. Eisner

More Related