190 likes | 506 Views
Машинен превод Strojový překlad Maskinoversættelse Maschinelle Übersetzung Maŝintradukado Traducción automática Itzulpengintza automatiko ترجمه ماشینی Konekäännin Traduction automatique תרגום מכונה Strojno prevođenje Gépi fordítás 機械翻訳 기계 번역 Terjemahan mesin Computervertaling
E N D
Машинен превод Strojový překlad Maskinoversættelse Maschinelle Übersetzung Maŝintradukado Traducción automática Itzulpengintza automatiko ترجمه ماشینی Konekäännin Traduction automatique תרגום מכונה Strojno prevođenje Gépi fordítás 機械翻訳 기계 번역 Terjemahan mesin Computervertaling Maskinoversettelse Tłumaczenie maszynowe Tradução automática Traducere automată Машинный перевод Maskinöversättning การแปลภาษาอัตโนมัติ 机器翻译 machine translationthe Wiki way Bittlingmayer Adam Mathias 27 February 2007 University of Washington LING 575 – Machine Translation
machine translation the Wiki way introduction to Wikipedia technical details and editing low-density languages parallelness of corpora named entities other entities disambiguation categorization problems papers
introduction to Wikipedia en.wikipedia.org
introduction to Wikipedia en.wikipedia.org Wikipedia (IPA: /ˌwiːkiːˈpiːdi.ə/ or /ˌwɪːkiːˈpiːdi.ə/) is a multilingual, Web-based, free contentencyclopedia project. Wikipedia is written collaboratively by volunteers; its articles can be edited by anyone with access to the Web site.
introduction to Wikipedia en.wikipedia.org the Wiki family lots of languages - unevenly distributed lots of topics – unevenly distributed growing fast respectability
technical details and editing technical details structure layout content rules tags and templates redirect and disambiguation markup
technical details and editing editing anyone locking and blocking disputes version control
technical details and editing Fei_Xia example
low-density languages predictably lacking X-English / English-X usually good using related languages
parallelness of corpora degrees determinants of parallelness mapping
named entities article titles abbreviations and acronyms place names company names personal names
other entities events dates titles technical terms
problems incompleteness inconsistency foreign words moving target
papers monolingual semantics errors and reliability WordNet using Wikipedia’s structure multilingual named entities parallel sentence generation
papers parallel sentence generation 1. compare with Babelfished version create aligned sentences with Babelfish pair off with best scoring sentence from the Wiki article 2. bootstrap from article titles create aligned sentences by replacing linked words with equivalent translate the rest by throwing shrinking N-grams into Wiki search pair off with best scoring sentence from the Wiki article
conclusions seed or bootstrap with traditional methods fill holes with Wikipedia hybrid systems lots of research to be done
questions general Chinese company names cn/hk/tw issues abbreviations/acronyms many languages with one writing system using links to find word divisions