90 likes | 245 Views
“It works, but how far can it go?”. Dallin Hardcastle. IBM CANDIDE. Statistical Machine Translation. Project from 1987-1994. IBM’s theory was that over 50% of western languages are completely predictable. Through the use of algorithms (IBM claimed 80% accurate algorithms).
E N D
“It works, but how far can it go?” DallinHardcastle
IBM CANDIDE • Statistical Machine Translation. • Project from 1987-1994. • IBM’s theory was that over 50% of western languages are completely predictable. • Through the use of algorithms (IBM claimed 80% accurate algorithms). • Accuracy in translation was generally 60%
IBM CANDIDE • It only used very large bilingual corpora. • Did not take into account any grammars, lexicons, phonological rules, etc. • The U.S. government DARPA (Defense Advanced Research Projects Agency) rated SYSTRAN higher than IBM’s new system, and used it frequently in the 1990’s. • DARPA even helped fund CANDIDE.
SYSTRAN • Founded in 1968 by Dr. Peter Toma. • Survived the major decrease of funding from ALPAC. • Has offices in Paris and La Jolla. • During the Cold War, helped the US Air Force extensively. • Provides technology for Babel Fish, also translation widget on Mac OS X.
SYSTRAN • Rule Based Machine Translation. • In a book by YorickWilcks, an AI professor in England, he claims that RBMT (like Systran) has outperformed SMT (like Candide) up to this point.
So, SMT or RbMT? • SMT seems to flow more “fluently”. • Generally, only 60% accuracy (on the high side). • Algorithms are not tailored to any specific languages. Benefit? Downfall? • Sometimes awkward constructions. • Once rules are established, much higher accuracy rates. • Translation between two languages with well-formed rules is easier. (Costly)
Google Translate/Babel Fish • http://www.youtube.com/watch?v=_GdSC1Z1Kzs • Babel Fish is now gone (for now), replaced by Bing Translator, also SMT. • Babel Fish was run by SYSTRAN
HYBRID • Dr. Wilcks proposes that for MT technology to truly advance, there must be highly sophisticated HYBRID systems. • This means a mix of SMT and RbMT. • Trados uses TM, whether local or from a server, but as far as very rapid, accurate, totally automated MT, we are not there yet.
Some hybrid companies? • IBM • Working with LinguaSys • SYSTRAN • In 2010, new Hybrid software