651 likes | 961 Views
A Hybrid Machine Translation System from Turkish to English. Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer. Introduction. Goal: Create a machine translation system that translates Turkish text into English text Turkish has an agglutinative morphology ev+im+de+ki+ne
E N D
A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer
Introduction • Goal: Create a machine translation system that translates Turkish text into English text • Turkish has an agglutinative morphology • ev+im+de+ki+ne • to the one at my home • Turkish has free word order • Ben eve gittim, Eve gittim ben, Gittim ben eve, ... • I went to the house • Idea Write rules to translate analyzed Turkish sentence into English
Outline • Machine Translation (MT) • Motivation • Challenges in MT • History of MT • Classical Approaches to MT • The Hybrid Approach • Challenges • Translation Steps • Analysis and Preprocessing • Transfer and Generation • Decoding • Evaluation • Methods • Experimental Results • Examples • Conclusions
Machine Translation Translation • Given: Input text s in source language S • Find: A well-formed text in target language T that is equivalent to s Machine Translation (MT) • Any system using an electronic computer to perform translation
Motivation • Satisfy increasing demand for translation • 100 languages with 5 million or more native speakers • Reduce the cost and effort of human translation • 13% of EU budget • weeks vs. minutes • Make information available to more people in less time • translation of web sites automatically • Exploring limits to computers’ ability and linguistic challenges
Challenges in MT • Morphological issues • Each language has a different morphology • Syntactical issues • Word order in sentences and noun phrases • Language-specific features (narrative past tense in Turkish, distinguishing feminine and masculine nouns) • Semantical issues • Word sense ambiguities • bank geographical term OR financial institution? • Idiomatic phrases • kafa çekmek pull head OR drink alcohol?
History of MT • Idea by Warren Weaver in 1945 • 1950s: Russian-English MT research during cold war between US and USSR • 1960s: Funding for research stopped due to failure • Mid-1970s • METÉO: English-French MT in Canada • Systran and Eurotra: Multi-lingual MT in Europe • TITRAN and MU Project in Kyoto University, Japan • After 90s • Statistical MT: Use statistics and large amount of data
MT between English and Turkish • Morphological analyzer • Oflazer, 1993. • Morphological disambiguator • Oflazer & Kuruöz, 1994. • Hakkani-Tür et al., 2000. • Yuret & Türe, 2006. • English-to-Turkish MT • Sagay, 1981. • Hakkani et al., 1998. • Keyder Turhan, 1997. • No Turkish-to-English system
Vauquois Triangle Interlingua Semantic level Transfer Analysis Generation Syntactic level Lexical level
Word-by-word Translation Source sentence Bilingual Dictionary Target sentence Source sentence: Ali evdeki kediyi çok sevmez Translation: Ali home cat very like Reference: Ali does not like the cat at home very much
Direct Translation Source sentence Morphological Analyzer Lexical Transfer Local Reordering Target sentence Source: Ali evde -ki kediyi çok sevmez Analysis: Ali ev+LocRel+Adjkedi+Accçok+Advsev+Neg+Present Lexical: Ali home+Locat+Adjcat+Accvery much+Adv like+Neg+Present Reorder: Ali at+Adjhome+Loc cat+Acclike+Neg+Present very much+Adv Generate: Ali at home cat not like very much
Transfer-based Translation SL Grammar TL Grammar Transfer rules / Dictionary Source sentence SL Representation TL Representation Target sentence
Transfer-based Translation SL Grammar TL Grammar Transfer rules / Dictionary Source sentence SL Representation TL Representation Target sentence NP NP mavi evin duvarı the wall of the blue house NP PP NP NP N duvar+ı Det the NP Prep of NP AP NP N wall N ev+in A mavi Det the NP AP N house A blue
Interlingual Translation Source sentence Target sentence Analysis Interlingua Generation • Source: Ali evdeki kediyi çok sevmez • Interlingua: ¬holds(in_general, • like(subj: Ali, • obj: cat(at: home), • degree: very much)) • Translation: Ali does not like the cat at home very much
Statistical MT Given a Turkish sentence t, find the English sentence e that is the “most likely” translation of t
Statistical MT Turkish-English aligned text English text whether an English text e is well-formed English or not whether an English text e is a good translation of a Turkish text t Translation Model P(t|e) Language Model P(e) Decoding argmax P(e) * P(t|e) e
Statistical MT Ali çok açtı Ali was so hungry
Outline • Machine Translation (MT) • Motivation • Challenges in MT • History of MT • Classical Approaches to MT • The Hybrid Approach • Challenges • Translation Steps • Analysis and Preprocessing • Transfer and Generation • Decoding • Evaluation • Methods • Experimental Results • Examples • Conclusions
Why Hybrid? Classical transfer-based approaches are good at • representing the structural differences between the source and target languages. and statistical methods are good at • extracting knowledge from large amounts of data, about how well-formed a sentence or how “meaningful” a translation is.
Challenges Morphological differences Avrupalılaştıramadıklarımızdanmışsınız Youwereamongthe ones whowewerenotableto causetobecomeEuropean • Extreme case of a word in an agglutinative language • Each Turkish morpheme corresponds to one or more words in English
Challenges Morphological differences arkadaşımdakiler the ones atmyfriend
Challenges Structural differences dinle+miş+sin (someone told me that) you listened dinle+di+n you listened dinle+t+ti+n you made (someone) listen dinle+t+tir+di+n you had (someone) make (someone) listen dinle+r+im I listen dinle+r+di+m I used to listen dinle+t+ebil+ir+miş+im ???
Challenges Structural differences Adam evde kitap okuyordu The man was reading a book at home SUBJ ADJCT OBJ V SUBJ V OBJ ADJCT mavi kitap blue book AP NP AP NP evdeki kitap the book at home AP NP NP AP kitabımın kapağı my book’s cover NP1 NP2 NP1 NP2 arkadaşımın yüzünden because of my friend NP1 NP2 NP2 NP1
Challenges Ambiguities • koyun • sheep (or bosom) • your bay • your dark (one) • of the bay • put!
Challenges Ambiguities • silahını evine koy • put your gun to your home • put your gun to his home • put his gun to your home • put his gun to his home • put your gun to her home • put her gun to your home • put her gun to her home • . • .
Challenges Ambiguities • kitabın kapağı • the book’s cover • book’s cover • the cover of the book
Challenges Ambiguities ev+Dative (gitti) (went) to the house masa+Dative (çıktı) (jumped) on the table adam+Dative(baktı) (looked) at the man
Challenges Morphological differences --------------------------------------------------------------------------- Structural differences --------------------------------------------------------------------------- Ambiguities Use morphological analysis on Turkish side and generation on English side Transfer rules can represent such transformations An English language model can determine the most probable translation statistically
The Avenue Transfer System • Avenue Project initiated by CMU LTI Group • Grammar formalism, which allows one to manually create a parallel grammar between two languages and • Transfer engine, which transfers the source sentence into possible target sentence(s) using this parallel grammar
Overview of Our Approach Turkish sentence Morphological Analyzer Analysis Preprocessor Lattice Transfer rules Avenue Transfer Engine English translations ... English Language Model Most probable English translation
I. Analysis and Preprocessing Morphological analyses of each word: A set of features, describing the structural properties of the word adam evde oğlunu yendi
I. Analysis and Preprocessing Lattice representation of the sentence ye+V +Pass+V+Past ev+N+Loc ada+N+P1Sg oğul+N+P2Sg 4 0 1 2 3 6 Zero+V+Past yen+N adam+N+PNon oğul+N+P3Sg 5 yen+V+Past
I. Analysis and Preprocessing Representation of IGs
II. Transfer and Generation N N N V
II. Transfer and Generation N N N N N V V N adam evde oğlunu yendi man won son house
II. Transfer and Generation NP NP N N N N N V V N the adam evde oğlunu yendi man won son house
II. Transfer and Generation SUBJ SUBJ NP NP N N N N N V V N the adam evde oğlunu yendi man won son house
II. Transfer and Generation SUBJ SUBJ NP NP NP NP N N N N N V V N the the adam evde oğlunu yendi man won son house
II. Transfer and Generation SUBJ Adjct SUBJ Adjct NP NP NP NP at N N N N N V V N the the adam evde oğlunu yendi man won son house
II. Transfer and Generation SUBJ Adjct SUBJ Adjct NP NP NP NP NP NP at N N N N N V V N the his the adam evde oğlunu yendi man won son house
II. Transfer and Generation OBJ SUBJ Adjct OBJ SUBJ Adjct NP NP NP NP NP NP at N N N N N V V N the his the adam evde oğlunu yendi man won son house
II. Transfer and Generation OBJ SUBJ Adjct OBJ SUBJ Adjct NP NP NP Vc NP Vc NP NP at N N N N N V V N the his the adam evde oğlunu yendi man won son house
II. Transfer and Generation Vfin Vfin OBJ SUBJ Adjct OBJ SUBJ Adjct NP NP NP Vc NP Vc NP NP at N N N N N V V N the the his the adam evde oğlunu yendi man won son house
II. Transfer and Generation S S Vfin Vfin OBJ SUBJ Adjct OBJ SUBJ Adjct NP NP NP Vc NP Vc NP NP at N N N N N V V N the his the adam evde oğlunu yendi man won son house
II. Transfer and Generation S S Vfin Vfin OBJ Adjct SUBJ Adjct OBJ SUBJ
II. Transfer and Generation Adjunct Adjunct NP NP at {Adjunct,3} Adjunct::Adjunct : [NP] -> ["at" NP] ( (x1::y2) (x0 = x1) ((x1 CASE) =c Loc) ((x1 poss) =c yes) (y0 = x0) )