1 / 28

Session outline

An introduction to the history and concept of Machine Translation Part 1 Dr Burcin K. Mustafa Princess Nourah university 2018. Session outline. Definition of Machine Translation (MT) History and development The ‘Georgetown experiment’ MT applications

drandall
Download Presentation

Session outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An introduction to the history and concept of Machine TranslationPart 1Dr Burcin K. MustafaPrincess Nourah university 2018

  2. Session outline • Definition of Machine Translation (MT) • History and development • The ‘Georgetown experiment’ • MT applications • Key concepts and terms relating to MT: • Technical terms • Pre and post editing • MT and human translation in competition or collusion? • The future of MT

  3. Locating MT in the translation technological spectrum MT CAT Tools The Computer The Human Fully automated high quality computer translation (FAHQCT) Human-aided computer translation (CAMT Computer-aided Human translation (CAHT) Human translation unaided by computers

  4. Defining Machine Translation (MT) • The process whereby a computer has the primary responsibility for the translation of a text. A human may assist in the process through such tasks as pre- or post-editing, but it is the computer, rather than the human, that produces an actual draft translation (Bowker 2002 pg 149). • A MT system can be defined as a computer translation tool that works by breaking down sentences or other text segments in a given source language, analyses them in context and then attempts to recreate their meaning in a given target language, taking into account inflection, idioms and word order (European Commission).

  5. MT history and developmentMain architects Warren WeaverLéon Dostert • Awarded a PhD 1921 • National Defense Research Committee • Applied Mathematics Panel • Awarded British King’s Medal • Achieved a military rank of Major • General Eisenhower’s interpreter • Organized Nuremberg trails • Introduced simultaneous interpretation to the UN

  6. Early assumptions underpinning MT development [K]nowing nothing official about, but having guessed and inferred considerable about, powerful new mechanized methods in cryptography – methods which I believe succeed even when one does not know what language has been coded – one naturally wonders if the problem of translation could conceivably be treated as a problem in cryptography (Weaver 1949 pg 5). We have considered this problem in some detail and it transpires that a machine of the type envisaged could perform this function without any modification in its design’ (Dr Andrew Booth cited in Weaver 1949 pg 6).

  7. Early assumptions underpinning MT development Bar-Hillel: • He was intrigued by the problem of making computers work toward the solution of non-numerical problems • Organized a conference on MT in 1952 • Bar-Hillel’s early assumption was FAHQT is not possible Are there fundamental differences between mathematical and linguistic problems?

  8. Early assumptions underpinning MT development What does the following statement mean?: ‘Let him have it’ Derek and Chris are trapped by the police. The police tell Chris to put down his gun. Derek on hearing this says, "Let him have it’. Chris fires the gun a number of times killing one officer and wounding another. However, because Chris is a minor he is given a minor’s sentence, but Derek on the basis of his statement his hanged for murder in 1953.

  9. Early assumptions underpinning MT development A more general basis for hoping that a computer could be designed which would cope with a useful part of the problem of translation is to be found in a theorem which was proved in 1943 by McCulloch and Pitts. This theorem states that a robot (or a computer) constructed with regenerative loops of a certain formal character is capable of deducing any legitimate conclusion from a finite set of premises. Now there are surely alogical elements in language (intuitive sense of style, emotional content, etc.) so that again one must be pessimistic about the problem of literary translation. But insofar as written language is an expression of logical character, this theorem assures one that the problem is at least formally solvable (Weaver 1969 pg 10-11) It is very tempting to say that a book written in Chinese is simply a book written in English which was coded into the "Chinese code." If we have useful methods for solving almost any cryptographic problem, may it not be that with proper interpretation we already have useful methods for translation? (Weaver 1969 pg 11).

  10. The first example of a ‘working’ MT systemThe 1954 Georgetown Experiment January 1954 First demonstration of a working MT The main engineers: Peter Sheridan of IBM Paul Garvin from Georgetown University Leon Dostert Georgetown University Мы передаем мысли посредством речи. We transmit thoughts through speech.

  11. The first example of a ‘working’ MT systemThe 1954 Georgetown Experiment A huge electric ‘brain’ with a 250-word vocabulary translated mouth-filling Russian sentences yesterday into simple English in less than ten seconds. As lights flashed and motors whirred inside the ‘brain’ the instrument’s automatic type-writer swiftly translated statements on politics, law, science and military affairs. Once the Russian words were fed to the machine no human mind intervened. In translating, for instance, a word “A” which precedes a word “B” in Russian, may be reversed in some cases in English. Each of the 250 words is coded for this inversion. Sometimes words must be inserted in the English text, sometimes they must be omitted, following code instructions. When there are several possible English meanings for a Russian word, the instructions tell the machine to pick out the meaning that best fits the context.

  12. The first example of a ‘working’ MT systemThe 1954 Georgetown Experiment

  13. The first example of a ‘working’ MT systemAssessing the viability of the Georgetown experiments claims A single parent from a small rural town in the south cried profusely when their only child was lost on the first day of the Iraq war, the 20th of March 2003. The issue of extracting meaning Little Johnny never wanted to see a pen again. That is because he had to spend his time in the pen with Crazy Jessie. Jacques Derrida (1978) describes texts as ‘chains, the systems of traces emerging out of and constituted by differences’ (pg 65)

  14. The first example of a ‘working’ MT systemAssessing the viability of the Georgetown experiments claims Garvin (1967) himself who worked on programming the 701; he notes in his summary of the characteristics of the experiment that, ‘the scope of the translation program was clearly specified. Any sentence meeting its narrow specifications could be translated, provided the required entries were present in the glossary’ (pg 48). Considering the state of the art of electronic computation at the time, it is remarkable that anything resembling automatic translation could have been achieved at all (Hutchins 2004 pg 11).

  15. The first example of a ‘working’ MT systemAssessing the viability of the Georgetown experiments claims Automatic Language Processing Advisory Committee (ALPAC) Report 1966: Quality of MT output: ‘Unedited machine output from scientific text is decipherable for the most part, but it is sometimes misleading and sometimes wrong (as is postedited output to a lesser extent), and it makes slow and painful reading’ . More economically viable options: ‘The Committee believes that in some cases it might be simpler and more economical for heavy users of Russian translations to learn to read the documents in the original language’.

  16. MT development and application post ALPAC • Revival of research in the 1970’s: • In 1976 the European Commission (EC) bought the Systran system • EC also funded an ambitious MT project, ‘EUROTRA’ for the translation of all EC languages The Eurotra programme statement of intent: ‘The technical objective of Eurotra is the creation of a prototype machine translation system capable of dealing with all Community languages. Politically, Eurotra can be seen as a research initiative aiming at the creation of a 'critical mass' of expertise on MT and computational linguistics in general in Europe’.

  17. MT development and application post ALPAC With the development of advanced information and retrieval systems MT has had a successful application in industries which have repetitive language use Product manufactures Software producers Political and legal intuitions such as the EU an UN Financial sector Military institutions Online data outlets; websites, forums, social networks etc.

  18. MT development and application post ALPAC MT at the EU • The European Commission’s Machine Translation Service was built around EC Systran, a specific version of the SYSTRAN system originally developed by the World Translation Center (USA) in 1976. However, SYSTRAN has been further developed and adapted by the European Commission for internal purposes. • In 2004 output from the Directorate-General for translation of the European Commission, was over 1.2 million pages or over 600 million words

  19. MT development and application post ALPAC MT at the EU The EU currently offers MT for all 24 official EU languages MT@EC is currently based on the MOSES open-source technology, a Statistical Machine Translation (SMT) system developed with co-funding from EU research and innovation programmes. MT@EC was developed by the Directorate-General for Translation (DGT) under the Interoperability Solutions for European Public Administrations (ISA) programme. MT@EC translation engines are rely on translation memories, comprising over 1 billion sentences in the 24 official EU languages produced by the translators of the EU institutions over the past decades. MT@EC can translates multiple documents into multiple languages in one go Considering the amount of translation which is legally required in, most cases, could the EU survive without MT? If Machine Translation was not possible how do you think its absence would affect Globalization?

  20. MT development and application post ALPAC MT in the Military The Defense Advanced Research Projects Agency DARPA ‘The genesis of that mission and of DARPA itself dates to the launch of Sputnik in 1957, and a commitment by the United States that, from that time forward, it would be the initiator and not the victim of strategic technological surprises’ (DARPA). Post 9/11 and the rise ‘Information Awareness office’ (IAO) Charles Wayne who leads the TIDES and EARS programmes states, Exploiting human language is currently a very labor intensive process. And much of it requires foreign language skills that are in very short supply in the defense, intelligence, and law enforcement communities. It is clear that the U.S. cannot succeed simply by adding more people. To obtain timely, actionable, mission-critical information, we absolutely must have effective language exploitation technology to magnify greatly the capabilities of a necessarily limited set of analysts.

  21. MT development and application post ALPAC MT in the Military TIDES (Translingual Information Detection, Extraction, and Summarization) The project’s objective is to create a system that can convert speech and text either ‘stationary or streaming’ into a digital text format which then can be automatically translated through MT systems. Ratheon Trans Talk A translation device used by the US military which a is hand held ‘Two-way Speech-to-Speech’ http://www.foxnews.com/tech/2012/05/17/star-trek-universal-translator-to-join-war-on-terror/ https://www.youtube.com/watch?v=fOIbdB7s0o4

  22. MT and HT, collision or collusion? • The initial ideas behind the 1954 MT ‘experiment’ have radically changed. • MT still requires human post-editors • MT is not applicable for all translation projects • There is no current viable replacement for interpreters • The demand for translation services is growing

  23. MT and HT, collision or collusion?

  24. MT and HT, collision or collusion?

  25. Key concepts and terms relating to MT The Different types of Machine translation systems: Rule-based MT (RBMT) Is a system based on linguistic information of source and target languages derived from (bilingual) dictionaries and grammars covering the main semantic, morphological, and syntactic regularities of each language respectively. Statistical-based MT (SBMT) and Example-based MT Generate translations from referring to existing translations stored in vast aligned bilingual corpora. Hybrid MT Hybrid machine translation engines combine the strengths of rule-based and statistical machine translation by merging the predictability and language consistency of rule-based MT with the fluency and flexibility of statistical MT Neural machine translation (NMT) NMT is a new approach to machine translation that uses a large neural network. It differs from statistical MT approaches in that it uses separate engineered subcomponents. NMT models use deep learning and representation learning. They require only a fraction of the memory needed by traditional statistical machine translation (SMT) models.

  26. Key concepts and terms relating to MT Due to the continued inherent limitations of MT successful MT often requires ‘Pre editing’, which may include: • the avoidance of unidiomatic expressions • the avoidance of using pronouns before a verb • Keeping sentence structures clear • Breaking up long complex sentences (one idea per sentence) • Keeping to a typical word order (when possible) e.g. subject - verb - object • Making sure the ST confirms to standard formal grammatical rules. • Replacing problematic terms with more widely used ones (when possible) • Controlled language

  27. Key concepts and terms relating to MT • Controlled language ‘controlled language can be defined as a form of language usage restricted by grammar and vocabulary rules’ (Austermul 2001) What are the possible wider social implications of adopting and using a controlled language?

  28. Key concepts and terms relating to MT • Post-editing: refers to the process of correcting the output of machine translation • The degree of the editing is dependent on the purpose of the translation • Post-editing is usually carried out by a qualified translator with experience and knowledge of both language pairs. • Would you like to work as a post-editor for a machine?

More Related