150 likes | 225 Views
Explore influential projects, open source toolkits, conferences, MT evaluations, and literature resources in the field of Machine Translation research.
E N D
Machine TranslationMT – Research Landscape Stephan Vogel Spring Semester 2011
Overview • Some influential projects • Open source toolkits • Conferences • MT evaluations • Literature and general resources • Disclaimer: this all is incomplete, subjective, biased! 11-711 Machine Translation
MT Projects • Verbmobil • Large speech translation project in Germany • Different translation paradigms • Success story for SMT • TIDES • DARPA funded US MT project • SMT widely used, small and large data track evaluations • Chinese-English and Arabic-English • GALE • DARPA funded • Follow-up to TIDES • TransTac • DARPA funded • Speech-to-Speech Translation • Targeted towards force protection 11-711 Machine Translation
MT Projects • TC-Star • European Project with partners from different universities • Technology and Corpora for Speech-to-Speech Translation • http://tcstar.org/ • EuroMatrix • 2006-2009, EuroMatixPlus 2009-2012 • Translate all European languages • Off-springs: WMT evaluations, MT marathon • euromatrix.net • Quero • French-German project • Kind of TC-Star follow-up • http://www.quaero.org/modules/movie/scenes/home/index.php?FUSEBOX_LANG=2 11-711 Machine Translation
Open Source Toolkits: Word Alignment • Game Changer • Lower barrier to enter the field • Transparency • Word Alignment • GIZA++ • Started out at JHU workshop, subsequently extended by Franz Josef Och (at RWTH and ISI) • Most widely used alignment toolkit • mGIZA++ • Multi-threaded/multi-core extension of GIZA++ • By Qin Gao: http://geek.kyloo.net/software/doku.php/mgiza:overview • Berkeley Aligner • Word alignment via quadratic assignment • http://code.google.com/p/berkeleyaligner/ • PostCAT (Posterior Constrained Alignment Toolkit) • http://www.seas.upenn.edu/~strctlrn/CAT/CAT.html 11-711 Machine Translation
Open Source Toolkits: WA cont. • Word Alignment tools • Alignment Set • Set of tools to manipulate and display alignments • From TALP research group • http://www.talp.upc.edu/talp/index.php/en/resources/tools/alingment-set 11-711 Machine Translation
Open Source Toolkits: Decoders • Decoders • Moses (Edinburgh): phrase-based and recently also hierarchical • Joshua (JHU): hiero reimplementation • sourceforge.net/projects/joshua • Jane (RWTH Aachen): hierarchical • http://www-i6.informatik.rwth-aachen.de/web/Software/index.html • cdec (UMD -> CMU): hierarchical and phrase-based • Marie (TALP): ngram-based (kinda phrase-based) • www.talp.upc.edu/talp/index.php/en/resources/tools/marie • Apertium (University of Alicante): rule-based • Phrasasl (Stanford): phrase-based • http://www-nlp.stanford.edu/wiki/Software/Phrasal 11-711 Machine Translation
Open Source Toolkits: LMs • SRILM • Most widely known and used LM toolkit • SALM • Written by Joy Ying Zhang (while at LTI) • http://projectile.sv.cmu.edu/research/public/tools/salm/salm.htm • IRST-LM • http://sourceforge.net/projects/irstlm/ • Ken-LM • Smaller footprint then SRILM • Written by Kenneth Heafield (LIT PhD student) • http://kheafield.com/code/kenlm/ 11-711 Machine Translation
Conferences • General CL conferences • ACL • HLT • EMNLP • Coling • IJCNLP • Int. Joint Conf on NLP • LREC • Language Resources and Evaluation • RANLP • Recent Advances in NLP • SALTMIL • Speech and Langauge Technology for Minority Languages • Specific MT conferences • MT Summit (every 2 years) • AMTA (US) • EAMT (Europe) • TMI • Translating and the Computer (organised by Aslib) • IWSLT (organized by C-Star consortium) • … • MT Workshops • WMT • Workshop on Machine Translation • SSST • Syntax, Semantics, and Structure in SMT • … 11-711 Machine Translation
Evaluations • It all started with TIDES • Comparative evaluations • Defined training and test data • Automatic evaluation metrics (NIST mteval, Bleu) • Organized by NIST • NIST Open MT Evaluations • Continuation and expansion of TIDES MT evaluations • Chinese-English, Arabic-English, Urdu-English • Restricted and unrestricted track • Originally every year, now going to 2 year cycle • http://www.itl.nist.gov/iad/mig/tests/mt/2009/ 11-711 Machine Translation
Evaluations (cont.) • WMT Evaluations • Organized in connection with EuroMatrix • Based on Europarl corpora • Many languages • Automatic and manual evaluation • http://www.statmt.org/wmt11/translation-task.html • IWSLT Evaluations • Spoken language • Languages vary: Chinese, Japanese, Arabic, Italian, … • Speech 1-best and lattices provided • Based on (small) BTEC corpus (basic traveler expression corpus) • Last time also lecture translations • http://iwslt2010.fbk.eu/node/15 11-711 Machine Translation
Evaluations (cont.) • Specific projects have evaluations • GALE • Arabic-English and Chinese-English • Broadcast news and broadcast conversations, newswire and blogs • Human evaluation (HTER) • Go/No-Go • Quero • European languages, also Arabic-French • This year WMT evaluation was used as Quero evaluation 11-711 Machine Translation
Journals • Machine Translation • Springer Science, formerly Kluwer Academic Publishers, vol.4- ,1989- • Articles available online (abstracts free, full texts on payment of fee) from Springer • Chief editor: Andy Way • http://www.springer.com/computer/ai/journal/10590 • Computation Linguistics • MIT Press • Now open access • http://www.mitpressjournals.org/loi/coli • ACM TSLP • Online publication • Started in 2005 • http://tslp.acm.org/ 11-711 Machine Translation
Journals (cont.) • IEEE Transactions on Audio, Speech, and Langauge Processing • http://www.signalprocessingsociety.org/publications/periodicals/taslp/ • The Prague Bulletin of Mathematical Linguistics • Has papers from recent MT Marathons, i.e. esp. descriptions of open source packages. • http://ufal.mff.cuni.cz/pbml.html 11-711 Machine Translation
Literature • MT-Archive: http://www.mt-archive.info/ • Compiled by John Hutchins for the EAMT • One stop shop! • Also links to books, journals, conferences • Papers listed by author, language, organization • ACL Anthology: http://www.aclweb.org/anthology/ 11-711 Machine Translation