50 likes | 62 Views
Explore the development of a groundbreaking English-to-Turkish machine translation system, aiming to overcome scarcity issues in Turkish inflectional morphology and improve translation quality. Customized translation strategies and advanced tools result in significant enhancements over traditional methods.
E N D
TURKALATORA Suite of Tools for English to Turkish MT Siddharth Jonathan Gorkem Ozbek CS224n Final Project June 14, 2006
English - Turkish MT • The challenge • Traditionally statistical MT research has focused on language pairs with rich resources • Ambitious goal – Complete English-to-Turkish MT system on par with those on the Web (Google, Systran, etc.) • Realistic goal – Outperform the general-purpose baseline • The focus • Address scarcity issues stemming from rich Turkish inflectional morphology • The strategy • Approximate a morphological analysis by exploiting certain aspects of Turkish morphology to get sub-lexical units • Customize translation model building heuristics to deal correctly with these units
Baseline English to Turkish MT System Phrase building heuristics Word Aligned English-Turkish Phrase translation table GIZA++ (aligner) Sentence Aligned English-Turkish Pharaoh (decoder) Turkish Corpus (training set) Turkish Language Model English Sentences SRILM Corpus: Approx. 22,000 aligned sentence pairs covering several genres Turkish Translations
The Turkalator Way… Turkish Text Phrase Translation table Segmentation Stem Alignment Phrase Extraction and Scoring General word Alignment English Text Pharaoh (decoder) Turkish Language Model
Evaluation • Quantitative results • Qualitative results • Scarcity reduced greatly: many more Turkish words are now translated • An example: • English input: “She thought it over.” • Reference translation: “Julia bunu iyice düşündü.” • Baseline translation: “Başvuran düşünce bu over.” • Turkalator translation: “Julia onun üzerinde düşündü.”