190 likes | 556 Views
Machine Translation. Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003. Outline. Machine Translation Definition Applications Algorithms Language Types Algorithms Transfer method Inter-lingua Direct translation
E N D
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003
Outline • Machine Translation • Definition • Applications • Algorithms • Language Types • Algorithms • Transfer method • Inter-lingua • Direct translation • Statistical Machine Translation
Definitions • Machine Translation • Automatically translate text from one language to another • Speech-to-Speech Translation • Speech Recognition, Machine Translation, Text to Speech
Machine Translation: Applications • Rough translation, e.g., Systran Babel Fish • Computer-aided human translation • Post-editing by human • Multilingual systems in limited domains, e.g., • Dialogue systems: high quality speech to speech translation • Information kiosks • Cross-language information retrieval
Machine Translation: Algorithms • Transfer Model • Maps languages at the syntactic parse tree level • Interlingua • Maps language via a common new virtual language • Direct Translation • Maps language at the lexical, phrase and shallow syntactic level • Statistical Machine Translation • Statistical Direct Translation models • Models fluency of generated text and faithfulness of translation
Language Types • Universal features • Syntactic classes: noun, verbs • Words have common meaning • Language typology • Morphological: • isolating (one morpheme per word) - polysynthetic • agglutinative (clean morpheme boundaries) - fusion languages • Syntactical: • SVO (subject-verb-object) • SOV (e.g., Hindi, Japanese) • VSO (e.g., Irish)
Other Language Differences • Morphology richness: ENG The dog belongs to the tall child GR Ο σκύλος είναι του ψηλού παιδιού • Use of articles: ENG I am runing GR Τρέχω • Lexical differences (aka lexical gap) ENG older elder GR μεγαλύτερος • Format differences, e.g., dates, numbers
Transfer Models • Three stages • Analysis: syntactic parse (ambiguity might not be a problem) • Transfer: syntactic transformation rules • Generation: lexical and syntactic transfer • Lexical transfer • Function words are syntactically transferred, i.e., part of the rules • Content words are lexically transferred
Transfer Model: Example Example (english to french): • ENG bad road • adjective: bad noun: road (analysis) • noun: road adjective: bad (transfer) • noun: route adjective: mauvaise (generation) • FR route mauvaise • Altavista Babel Fish (Systran): • http://babelfish.altavista.com/ • ENG: bad road FR: mauvaise route (wrong road) • ENG: wrong road FR: mauvaise route (wrong road)
Inter-lingua • Motivation: • Translation between n languages n(n-1)/2 pairs of rules • With inter-lingua 2n pairs • Assertion: • There is a common semantic representation across languages • Algorithm: • Perform syntactic analysis • Perform semantic analysis in inter-lingua ontology representation • Generate syntactic tree in new language • Generate surface form using lexical transfer • Problem: inter-lingua is often english! • Useful for small domains
Direct Translation • Motivation • Syntactic and (especially) semantic parsing often fails • Inter-lingua and transfer models hard to build • Direct translation algorithm • Morphological analysis • Lexical transfer of content words • Various work relating to prepositions • SVO re-arrangements • Miscellany • Morphological generation • Example: Japanese to English
Direct Translation: Overall • A realistic approach • Uses syntax/semantics as needed • Robust (island) parsing • Shallow parsing • Works only for language pairs • Can be extended with (e.g.) English as inter-lingua
Statistical Machine Translation • Motivation • Why write rules? • Machine learning techniques can do the job for you • Requirement • Large bi-lingual (parallel) corpora • Typically alignment required at the sentence level • Baysesian Formulation (Brown et al, 1993)
Statistical Machine Translation • Step 1: Preprocessing (manual or semi-automatic) • Clean parallel corpus • Segment at the sentence level • Step 2: Alignment (automatic) ENG: And the program has been implemented GR: Tο πρόγραμμα τέθηκε σε εφαρμογή The(1) program(2) has(3) been(3) implemented(3,4,5) The(1) program(2) has(3,4,5) been(3,4,5) implemented(3,4,5) • Step 3: Translation Models: Pr(G,A,E) • G and E are the Greek and English strings • A is a random alignment between them
Statistical Machine Translation • Models (Brown Model 1, 2, 3 ,4, 5): • Greek g and English e strings • Number of words in Greek string m • Alignment of j the word is aj
Statistical Machine Translation • Best translation • Faithfulness • Fluency • Example-based machine translation • Ability to store phrases into bi-lingual dictionary • Translation memory • Systran and most translation houses use this
Evaluation • Edit cost • Distance between standard (human-produced) and machine-generated translation