290 likes | 612 Views
Machine Translation. Surma Mukhopadhyay 29 th March, 2007. Followed By:-. The Soldiers are in the Coffee – An Introduction to Machine Translation By Marieke Napier October Cultivate Interactive, issue 2, 16, 2000 & Language Technology Machine Translation
E N D
Machine Translation Surma Mukhopadhyay 29th March, 2007
Followed By:- • The Soldiers are in the Coffee – An Introduction to Machine Translation By Marieke Napier October Cultivate Interactive, issue 2, 16, 2000& • Language Technology Machine Translation (From the course Material COMP248) By Rolf Schweitzer Department of Computing,Macquarie University, NSW 2109, Australia
Introduction • Though research in Machine Translation (MT) has already celebrated its fiftieth birthday, understanding of its successes is still minimal • The increase in availability of Machine Translation software due to the globalization of the Internet has had little impact. • User's knowledge of the complexities behind translating remains limited and judgments are based on one off personal experiences.
What is Machine Translation? • The European Association for Machine Translation gives the following definition for MT: "Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another "
Types of Machine Translation Unassisted Machine Translation: • Unassisted MT takes pieces of text and translates them into output for immediate use with no human involvement • The result is unpolished text and gives only a gist of the source, hence the term 'gisting' • The ultimate aim of this type of MT is sometimes known as Fully Automatic High Quality Translation (FAHQT), perfect translation created solely by a computer • Examples of this form of MT include IBM alphaworks native search, Babel Fish 2020 , Worldlingo and Dragon systems
Types of Machine Translation Assisted machine Translation: • Assisted MT uses a human translator to clean up after, and sometimes before, translation in order to get better quality results. • Usually the process is improved by limiting the vocabulary through use of a dictionary and the types of sentences/grammar allowed. • The use of a 'controlled language' has been fairly successful. Some systems have also been set up to learn from corrections.
Assisted Machine Translation • Assisted MT can be divided into Human Aided Machine Translation (HAMT), a machine that uses human help, and Machine Aided Human Translation (MAHT). • Computer Aided Translation (CAT) is a more recent form of MAHT.
Natural Language Processing • Another area of MT that is worth mentioning here is Natural Language Processing (NLP) • NLP parses sentences and determines their underlying meaning in order for databases to answer SQL queries entered in the form of a question • For further information on the structure of MT systems see the recent special report on the future of translation featured in ‘Wired’ magazine (www.wired.com)
Concept of Transfer Component • The structure of MT systems can vary but all use some sort of transfer component. • This component is specialized so that a pair of languages can produce a target sentence. • The transfer component has a correspondence lexicon, which is a comprehensive list of the source-language patterns and phrases mapped to a target language. • Some MT systems use systematic transfer systems, which apply software parsers to analyses the source language sentences. • This type of transfer system means that for every two languages that translation is required between a new a correspondence lexicon must be created.
Concept of Interlingua • An alternate to the transfer component is an Interlingua, a type of intermediate language • A translation is made from the source language into the Interlingua and then into the target language • The benefits of using an Interlingua are that only one part is required for each language and therefore further languages can be added easily
Why Machine Translation is Difficult? • A single word can have more than one meaning • Lexical gaps: single-word concepts with no simple translation • Idioms • Different languages use different syntactic structures • Some syntactic structures are not possible in some languages
Why Machine Translation is Difficult? • We need to find the correct interpretation • Literal translation does not produce fluent text • Literal translation does not preserve semantic information • Literal translation does not preserve pragmatic information
Direct Machine Translation: • Word-for-word substitution with some local adjustment. • Transfer-based MT • Analysis of source into a syntactic structure representation • Transfer of that representation into the target structure, • Synthesis of the output from that structure. • Interlingua-based MT • Analysis of source into an abstract meaning representation, • Generating target language from this interlingua.
Transfer Based Machine Translation • Transfer-based MT needs n(n-1) transfer modules for n languages. • If the transfer modules are bidirectional, then [n(n-1)]/2.
Interlingua Based Machine Translation • For n languages, only n language analyzer/generator are needed. • Problem: Different languages "carve the world up" differently.
Translation Memory • Translation memory software stores matching source and target language segments that were translated by a translator in a database for future reuse • Newly encountered source language segments are compared to the database content, and the resulting output (exact, fuzzy or no match) is reviewed by the translator
KANTOO • KANTOO is a interlingua-based MT system • KANTOO is designed for multilingual document production • KANTOO includes modules for source language analysis target language generation source terminology management target terminology management knowledge source development
KANTOO~ Some Features • Controlled language checker which is used for vocabulary and grammar checking in each document. • Batch translator is actually an analyzer and generator, utilized as standalone batch servers. • Knowledge maintenance tool is a graphical user interface which allows developers to test their knowledge changes in the context of a complete working system. • Knowledge server provides network access to a version controlled repository.
Performance of an Analyzer in KANTOO Analyzer performs • tokenization • morphological processing • lexical lookup • syntactic parsing with a unification grammar • semantic interpretation yielding one or more interlingua expression for each valid sentence
Performance of a Generator in KAANTOO Generator performs • lexical selection • structural mapping • syntactic generation • morphological realization for a target language
More features of KAANTOO • Lexical maintenance tool is used by domain experts to maintain source terminology • Language translation database is used by translators to create target translations of new source terminology
CONCLUSION • The future of MT remains uncertain but with the growth of international trade and the continuing increase in use of MT technologies on the Web, things are looking up. It is expected that more MT products will come to market than ever before and a larger number of languages can be tackled.