290 likes | 611 Views
Statistical Machine Translation with Moses. 0.6227. Hieu Hoang Localization World 2013. Agenda. What is Statistical Machine Translation? What is Moses? Common misconceptions Coming up What can we do for you?. Agenda. What is Statistical Machine Translation? What is Moses?
E N D
Statistical Machine Translation with Moses 0.6227 Hieu Hoang Localization World 2013
Agenda • What is Statistical Machine Translation? • What is Moses? • Common misconceptions • Coming up • What can we do for you? Moses by Hieu Hoang, University of Edinburgh
Agenda • What is Statistical Machine Translation? • What is Moses? • Common misconceptions • Coming up • What can we do for you? Moses by Hieu Hoang, University of Edinburgh
What is Statistical Machine Translation? It is very tempting to say that a book written in Chinese is simply a book written in English which was coded into the “Chinese code.” If we have useful methods for solving almost any cryptographic problem, may it not be that with proper interpretation we already have useful methods for translation? Warren Weaver 1949 Moses by Hieu Hoang, University of Edinburgh
What is Statistical Machine Translation? • NLP Application • search engines, text mining etc. • Big-data • bi-text from the Internet • eg. multilingual websites, documents • large monolingual data • Learn to translate • from previous translations • models of language Moses by Hieu Hoang, University of Edinburgh
What is Statistical Machine Translation? Training Using Source Text Linguistic Tools Training Data bi-text monolingual data dictionary § SMT System SMT System translation model language model lots of numbers… translation model language model lots of numbers… Source Text Moses by Hieu Hoang, University of Edinburgh
What is a model? • Translation Model • Language Model • (of the target language) thanks to Precision Translation Tools Moses by Hieu Hoang, University of Edinburgh
What is a model? • Translation model • source translation • probability Moses by Hieu Hoang, University of Edinburgh
What is a model? • Language model • Likelihood of sentence • in target language Moses by Hieu Hoang, University of Edinburgh
Agenda • What is Statistical Machine Translation? • What is Moses? • Common misconceptions • Coming up • What can we do for you? Moses by Hieu Hoang, University of Edinburgh
What is Moses? • Replacement for Pharoah • Academic software • Closed-source • Open source • Re-written, clean code • More features • Large developer community • Initiated by Hieu Hoang • Developed at NLP Workshop Moses by Hieu Hoang, University of Edinburgh
Agenda • What is Statistical Machine Translation? • What is Moses? • Timeline • Common misconceptions • Coming up • What can we do for you? Moses by Hieu Hoang, University of Edinburgh
What is Moses? Common Misconceptions • Only for Linux • Difficult to use • Unreliable • Only phrase-based • Developed by one person • Slow Moses by Hieu Hoang, University of Edinburgh
Only works on Linux • Tested on • Windows 7 (32-bit) with Cygwin 6.1 • Mac OSX 10.7 with MacPorts • Ubuntu 12.10, 32 and 64-bit • Debian 6.0, 32 and 64-bit • Fedora 17, 32 and 64-bit • openSUSE 12.2, 32 and 64-bit • Project files for • Visual Studio • Eclipse on Linux and Mac OSX Moses by Hieu Hoang, University of Edinburgh
Difficult to use • Easier compile and install • Boost bjam • No installation required • Binaries available for • Linux • Mac • Windows/Cygwin • Moses + Friends • IRSTLM • GIZA++ and MGIZA • Ready-made models trained on Europarl Moses by Hieu Hoang, University of Edinburgh
Unreliable • Monitor check-ins • Unit tests • More regression tests • Nightly tests • Run end-to-end training • http://www.statmt.org/moses/cruise/ • Tested on all major OSes • Train Europarl models • Phrase-based, hierarchical, factored • 8 language-pairs • http://www.statmt.org/moses/RELEASE-1.0/models/ Moses by Hieu Hoang, University of Edinburgh
Only phrase-based model • replacement for Pharoah • extension of Pharaoh • From the beginning • Factored models • Lattice and confusion network input • Multiple LMs, multiple phrase-tables • since 2009 • Hierarchical model • Syntactic models Moses by Hieu Hoang, University of Edinburgh
Developed by one person • ANYONE can contribute • 50 contributors ‘git blame’ of Moses repository Moses by Hieu Hoang, University of Edinburgh
Slow Decoding thanks to Ken!! Moses by Hieu Hoang, University of Edinburgh
Slow Training • Multithreaded • Reduced disk IO • compress intermediate files • Reduce disk space requirement Moses by Hieu Hoang, University of Edinburgh
What is Moses? Common Misconceptions • Only for Linux • Difficult to use • Unreliable • Only phrase-based • Developed by one person • Slow Moses by Hieu Hoang, University of Edinburgh
What is Moses? Common Misconceptions • Only for Linux Windows, Linux, Mac • Difficult to useEasier compile and install • UnreliableMulti-stage testing • Only phrase-basedHierarchical, syntax model • Developed by one personeveryone • SlowFastest decoder, multithreaded training, less IO Moses by Hieu Hoang, University of Edinburgh
Agenda • What is Statistical Machine Translation? • What is Moses? • Common misconceptions • Coming up • What can we do for you? Moses by Hieu Hoang, University of Edinburgh
Coming up… • Code cleanup • Incremental Training • Better translation • smaller model • bigger data • faster training and decoding • Applications • CAT tools • Speechtranslation Moses by Hieu Hoang, University of Edinburgh
Applications Computer-Aided Translation • EU Project • CASMACAT • MATECAT Moses by Hieu Hoang, University of Edinburgh
Agenda • What is Statistical Machine Translation? • What is Moses? • Common misconceptions • Coming up • What can we do for you? Moses by Hieu Hoang, University of Edinburgh
What can we do for you? • simpler Moses • graphical interface • Windows compatibility • terminology and glossary • incremental training • What can you do for us? • code • data • funding Moses by Hieu Hoang, University of Edinburgh
What can we do for you? • simpler Moses • graphical interface • Windows compatibility • terminology and glossary • incremental training • What can you do for us? • code • data • funding Moses by Hieu Hoang, University of Edinburgh