1 / 29

Machine Translation

Machine Translation. Om Damani (Ack: Material taken from JurafskyMartin 2 nd Ed., Brown et. al. 1993). State of the Art. The spirit is willing but the flesh is weak. English-Russian Translation System. Дух охотно готов но плоть слаба. Russian-English Translation System.

dale
Download Presentation

Machine Translation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Translation Om Damani (Ack: Material taken from JurafskyMartin 2nd Ed., Brown et. al. 1993)

  2. State of the Art The spirit is willing but the flesh is weak English-Russian Translation System Дух охотно готов но плоть слаба Russian-English Translation System The vodka is good, but the meat is rotten Babelfish: Spirit is willingly ready but flesh it is weak Google: The spirit is willing but the flesh is week

  3. State of the Art (English-Hindi) – March 19, 2009 The spirit is willing but the flesh is weak Google English-Hindi Translation System आत्मा पर शरीर दुर्बल है Google Hindi-English Translation System Spirit on the flesh is weak

  4. Is State of the Art (English-Hindi) so bad Is state of the art so bad Google English-Hindi Translation System कला की हालत इतनी खराब है Google Hindi-English Translation System The state of the art is so bad

  5. State of the english-hindi translation is not so bad State of the english hindi translation is not so bad Google English-Hindi Translation System राज्य के अंग्रेज़ी हिन्दी अनुवाद का इतना बुरा नहीं है Google Hindi-English Translation System State of the English translation of English is not so bad OK. Maybe it is __ bad.

  6. State of the English-Hindi translation is not so bad State of the English Hindi translation is not so bad Google English-Hindi Translation System राज्य में अंग्रेजी से हिंदी अनुवाद का इतना बुरा नहीं है राज्य के अंग्रेज़ी हिन्दी अनुवाद का इतना बुरा नहीं है Google Hindi-English Translation System English to Hindi translation in the state is not so bad OK. Maybe it is __ __ bad.

  7. Your Approach to Machine Translation

  8. Translation Approaches

  9. Direct Transfer – What Novices do

  10. Direct Transfer: Limitations कई बंगाली कवियों ने इस भूमि के गीत गाए हैं Kai Bangali kaviyon ne is bhoomi ke geet gaaye hain Morph: कई बंगाली कवि-PL,OBL ने इस भूमि के गीत {गाए है}-PrPer,Pl Kai Bangali kavi-PL,OBL ne is bhoomi ke geet {gaaye hai}-PrPer,Pl Lexical Transfer: Many Bengali poet-PL,OBL this land of songs {sing has}- PrPer,Pl Local Reordering: Many Bengali poet-PL,OBL of this land songs {has sing}- PrPer,Pl Final: Many Bengali poets of this land songs have sung Many Bengali poets have sung songs of this land

  11. Syntax Transfer (Analysis-Transfer-Generation) Here phrases NP, VP etc. can be arbitrarily large

  12. Syntax Transfer Limitations He went to Patna -> Vah Patna gaya He went to Patil -> Vah Patil ke pas gaya Translation of went depends on the semantics of the object of went Fatima eats salad with spoon – what happens if you change spoon Semantic properties need to be included in transfer rules – Semantic Transfer

  13. contact obj agt you pur farmer this plc :01 region or nam taluka manchar nam khatav Interlingua Based Transfer For this, you contact the farmers of Manchar region or of Khatav taluka. In theory: N analysis and N transfer modules in stead of N2 In practice: Amazingly complex system to tackle N2 language pairs

  14. Difficulties in Translation – Language Divergence(Concepts from Dorr 1993, Text/Figures fromDave, Parikh and Bhattacharyya 2002) Constituent Order Prepositional Stranding Null Subject Conflational Divergence Categorical Divergence

  15. Lost in Translation: We are talking mostly about syntax, not semantics, or pragmatics Image from http://inicia.es/de/rogeribars/blog/lost_in_translation.gif You: Could you give me a glass of water Robot: Yes. ….wait..wait..nothing happens..wait… …Aha, I see… You: Will you give me a glass of water …wait…wait..wait..

  16. CheckPoint • State of the Art • Different Approaches • Translation Difficulty • Need for a novel approach

  17. Statistical Machine Translation: Most ridiculous idea ever Consider all possible partitions of a sentence. For a given partition, Consider all possible translations of each part. Consider all possible combinations of all possible translations Consider all possible permutations of each combination And somehow select the best partition/translation/permutation कई बंगाली कवियोंने इस भूमिकेगीत गाए हैं Kai Bangalikaviyon ne isbhoomi ke geet gaayehain To thisspacehave sung songsofmany poets from Bangal

  18. How many combinations are we talking about Number of choices for a N word sentence N=20 ?? Number of possible chess games

  19. इसके लिएआप मंचरक्षेत्र के किसानों सॆसंपर्क कीजिए Forthis youcontact thefarmersof Manchar region How do we get the Phrase Table Collect large amount of bi-lingual parallel text. For each sentence pair, Consider all possible partitions of both sentences For a given partition pair, Consider all possible mapping between parts (phrases) on two side Somehow assign the probability to each phrase pair

  20. Data Sparsity Problems in Creating Phrase Table Sunil is eating mangoe -> Sunil aam khata hai Noori is eating banana -> Noori kela khati hai Sunil is eating banana -> We need examples of everyone eating everything !! We want to figure out that eating can be either khata hai or khati hai And let Language Model select from ‘Sunil kela khata hai’ and ‘Sunil kela khati hai’ Select well-formed sentences among all candidates using LM

  21. Formulating the Problem . A language model to compute P(E) . A translation model to compute P(F|E) . A decoder, which is given F and produces the most probable E

  22. P(F|E) vs. P(E|F) P(F|E) is the translation probability – we need to look at the generation process by which <F,E> pair is obtained. Parts of F correspond to parts of E. With suitable independence assumptions, P(F|E) measures whether all parts of E are covered by F. E can be quite ill-formed. It is OK if {P(F|E) for an ill-formed E} is greater than the {P(F|E) for a well formed E}. Multiplication by P(E) should hopefully take care of it. We do not have that luxury in estimating P(E|F) directly – we will need to ensure that well-formed E score higher. Summary: For computing P(F|E), we may make several independence assumptions that are not valid. P(E) compensated for that. We need to estimate P(It is raining| बारिश हो रही है) vs.P(rain is happening| बारिश हो रही है) P(बारिश हो रही है|It is raining) = .02 P(बरसात आ रही है| It is raining) = .03 P(बारिश हो रही है|rain is happening) = .420

  23. CheckPoint • From a parallel corpus, generate probabilistic phrase table • Give a sentence, generate various candidate translations using the phrase table • Evaluate the candidates using Translation and Language Models

  24. What is the meaning of Probability of Translation • What is the meaning of P(F|E) • By Magic: you simply know P(F|E) for every (E,F)pair – counting in a parallel corpora • Or, each word in E generates one word of F, independent of every other word in E or F • Or, we need a ‘random process’ to generate F from E • A semantic graph G is generated from E and F is generated from G • We are no better off. We now have to estimate P(G|E) and P(F|G) for various G and then combine them – How? • We may have a deterministic procedure to convert E to G, in which case we still need to estimate P(F|G) • A parse tree TE is generated from E; TE is transformed to TF; finally TF is converted into F • Can you write the mathematical expression

  25. The Generation Process • Partition: Think of all possible partitions of the source language • Lexicalization: For a give partition, translate each phrase into the foreign language • Spurious insertion: add foreign words that are not attributable to any source phrase • Reordering: permute the set of all foreign words - words possibly moving across phrase boundaries Try writing the probability expression for the generation process We need the notion of alignment

  26. Generation Example: Alignment

  27. Simplify Generation: Only 1->Many Alignments allowed

  28. Alignment A function from target position to source position: The alignment sequence is: 2,3,4,5,6,6,6 Alignment function A: A(1) = 2, A(2) = 3 .. A different alignment function will give the sequence:1,2,1,2,3,4,3,4 for A(1), A(2).. To allow spurious insertion, allow alignment with word 0 (NULL) No. of possible alignments: (I+1)J

  29. CheckPoint • From a parallel corpus, generate probabilistic phrase table • Give a sentence, generate various candidate translations using the phrase table • Evaluate the candidates using Translation and Language Models • Understanding of Generation Process is critical • Notion of Alignment is important

More Related