1.03k likes | 1.08k Views
Explore the feasibility of machine translation, classical problems, quality evaluation, and controlled languages in this comprehensive guide. Learn about the advantages, limitations, and advancements in the field of automated translations.
E N D
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004
Translation ”substitute the text material of one language (SL) by the equivalent text material of another language (TL)” (Catford 1965: 20) ”Translation consists in producing in the target language the closest natural equivalent of the text material of the source language, in the first hand concerning meaning, in the second hand concerning style (Nida 1975: 32) ”Translation is in theory impossible, but in practice fairly possible” Mounin (1967) Catford, J. C. (1965), A Linguistic Theory of Translation, Oxford Press, England. Mounin, G. (1967) Les problèmes théotitiques de la traduction. Paris Nida, E. (1975), A Framework for the Analysis and Evaluation of Theories of Translation, in Brislin, R. W. (ed) (1975), Translation Application and Research, Gardner Press, New York. Anna Sågvall Hein, GSLT, September 2004
Equivalence • form • meaning • style • effect Anna Sågvall Hein, GSLT, September 2004
Formal and dynamic equivalence • Formal equivalence focuses attention on the message itself, in both form and content. It aims to allow the reader to understand as much of the SL context as possible. • Dynamic equivalence is based on the principle of equivalent effect, i.e. that the relationship between receiver and message should aim at being the same as that between the original receivers and the SL message. (Nida 75) Anna Sågvall Hein, GSLT, September 2004
Can computers translate? • Not a simple yes or no; it depends on the purpose of the translation and the required quality. Anna Sågvall Hein, GSLT, September 2004
Classical problems with MT • unrealistic expectations • bad translations • difficulties in integrating MT in the work flow • the Ericsson case Anna Sågvall Hein, GSLT, September 2004
Feasibility of machine translation • quality in relation to purpose • control of the source language • human machine interaction • re-use of translations • evalution Anna Sågvall Hein, GSLT, September 2004
Quality • publishing quality • editing quality • browsing qualiy Anna Sågvall Hein, GSLT, September 2004
Translation related tasks • translation • browsing • gisting • drafting • message dissemination • cross-language information searches • cross-language interchanges Anna Sågvall Hein, GSLT, September 2004
MT as a cross-language communication tool MT is used not only for pure translation purposes but also for writing in a foreign language and for browsing (Hutchins 2001) Hutchins, J., 2001, Towards a new vision for MT, Introductory speech at MT Summit VIII conference, 18-22 September 2001 (http://ourworld.compuserve.com/homepages/WJHutchins/MTS-2001.pdf) Anna Sågvall Hein, GSLT, September 2004
Control of the source language • spell checked and grammar checked SL • sublanguage • Domain • Text type • controlled language Anna Sågvall Hein, GSLT, September 2004
Spell checking and grammar checking • If there are spelling errors or typos in the SL dictionary search will fail • If there are grammatical errors in the SL grammatical analysis will fail • Where and how should spell and grammar checking be accounted for? Before or in the process? Anna Sågvall Hein, GSLT, September 2004
Controlled language • consistent authoring of source texts • reduction of ambiguity • full linguistic coverage • controlled vocabulary • full lexical coverage • controlled grammar • full grammatical coverage • controlled language checking • e.g. Scania Checker Anna Sågvall Hein, GSLT, September 2004
Ex. of controlled languages • Simplified English • KANT controlled English • Scania Swedish • Scania checker Anna Sågvall Hein, GSLT, September 2004
Human intervention • before • language checking • during • e.g. ambiguity resolution • after • post-editing Anna Sågvall Hein, GSLT, September 2004
Re-use of translations • translation memories • translation dictionaries incl. terminologies • lexicalistic translation • statistical machine translation • example-based translation Anna Sågvall Hein, GSLT, September 2004
Evaluation of MT • human • automatic • using a gold standard • coverage (recall) • quality (precision) • global similarity measures • merge of recall and precision • BLEU, NIST Anna Sågvall Hein, GSLT, September 2004
Why machine translation? • cheaper • faster • more consistent • when it succeeds … Anna Sågvall Hein, GSLT, September 2004
What is MT proper? To be considered as MT, a system should provide • minimally correct morphology • minimal syntactic processing • minimal semantic processing • handle and produce full sentences Hutchins, J., 2000, The IAMT Certification initiative and defining translation system categories (http://nl.ijs.si/eamt00/proc/Hutchins.pdf) Anna Sågvall Hein, GSLT, September 2004
Examples of MT products • Systran (http://babelfish.altavista.com/) • Comprendium (based on Metal) • ProMT(http://www.translate.ru/eng) • ESTeam See further: http://ourworld.compuserve.com/homepages/WJHutchins/Compendium-4.pdf , http://www.foreignword.com/Technology/mt/mt.htm Anna Sågvall Hein, GSLT, September 2004
Basic strategies • direct translation • rule-based translation • transfer • interlingua • example-based translation • statistical translation • hybrids Anna Sågvall Hein, GSLT, September 2004
Direct translation • no complete intermediary sentence structure • translation proceeds in a number of steps, each step dedicated to a specific task • the most important component is the bilingual dictionary • typically general language • problems with • ambiguity • inflection • word order and other structural shifts Anna Sågvall Hein, GSLT, September 2004
Simplistic approach • sentence splitting • tokenisation • handling capital letters • dictionary look-up and lexical substitution incl. some heuristics for handling ambiguities • copying unknown words, digits, signs of punctuation etc. • formal editing Anna Sågvall Hein, GSLT, September 2004
Advanced classical approach(Tucker 1987) • Source text dictionary look-up and morphological analysis • Identification of homographs • Identification of compound nouns • Identification of nouns and verb phrases • Processing of idioms Anna Sågvall Hein, GSLT, September 2004
Advanced approach, cont. • processing of prepositions • subject-predicate identification • syntactic ambiguity identification • synthesis and morphological processing of target text • rearrangement of words and phrases in target text Anna Sågvall Hein, GSLT, September 2004
Feasibility of the direct translation strategy Is it possible to carry out the direct translation steps as suggested by Tucker with sufficient precision without relying on a complete sentence structure? Anna Sågvall Hein, GSLT, September 2004
Assignment 1: manual direct translation Sv. Ytterst handlar kampen för sysselsättning om att hålla samman Sverige. En. Ultimately, the fight for full employment concerns the cohesion of Swedish society. (from Statement of Government Policy 1996) • Define an algorithm and a dictionary (based on Norstedts) for simplistic translation of the example. • Present the model and the result. Anna Sågvall Hein, GSLT, September 2004
Assignment 1, cont. • Improve the result stepwise in accordance with the advanced direct translation strategy • Specify each step carefully and demonstrate its effect on the translation. • Evaluate and discuss the final result. • Translate the ex. using Systran (http://kwic.systran.fr/systran/svdemo) and discuss the differences in an evaluative way • Report the assignment and up-load on the web (041001) Anna Sågvall Hein, GSLT, September 2004
Current trends in direct translation • re-use of translations • translation memories of sentences and sub-sentence units such as words, phrases and larger units • lexicalistic translation • example-based translation • statistical translation Will re-use of translations overcome the problems with the direct translation approach that were discussed above? If so, how can they be handled? Anna Sågvall Hein, GSLT, September 2004
Systran • System Translation • developed in the US by Peter Toma • first version 1969 (Ru-En) • EC bought the rights of Systran in 1976 • currently 18 language pairs • demo version sv-en in 2003 (http://kwic.systran.fr/systran/svdemo) • http://babelfish.altavista.com/ Anna Sågvall Hein, GSLT, September 2004
Systran, cont. • more than 1,600,000 dictionary units • 20 domain dictionaries • daily use by EC translators, administrators of the European institutions • originally a direct translation strategy • see H&S • today more of a transfer-based strategy Anna Sågvall Hein, GSLT, September 2004
Ex. 1: fairly good translation /Systran sv-en • "Enskilda företagare som inte bildat bolag klassificeras hit." • "Individual entrepreneurs that have not formed companies are classified here.” • Systemet har känt igen bildat som en perfektform och översätter tempusformen korrekt have formed med negationen not på rätt plats. Anna Sågvall Hein, GSLT, September 2004
Ex. 2: word order problem/ Systran sv-en • "När byarna kontaktades hade de inte ens utsatts för influensa." • "When the villages were contacted had they not even been exposed to flu.” • Systemet har inte hittat subjekt och predikat och ger därför fel ordföljd. Anna Sågvall Hein, GSLT, September 2004
Ex. 3: ambiguity problem/ Systran sv-en • "Vad kan vi lära av Arrawetestammen?" • "What can we faith of the Arawete?” • Systemet hittar inte sambandet mellan kan och lära och ser därför inte att lära är ett verb. Anna Sågvall Hein, GSLT, September 2004
Ex. 4: ambiguity problem/ Systran sv-en • ”Extrapoleringen går till så här. " • ”The extrapolation goes to so here.” • Systemet känner inte till partikelverbet känna till och översätter därför felaktigt ord för ord. Anna Sågvall Hein, GSLT, September 2004
Systran Linguistic Resources • Dictionaries • POS Definitions • Inflection Tables • Decomposition Tables • Segmentation Dictionaries • Disambiguation Rules • Analysis Rules Anna Sågvall Hein, GSLT, September 2004
Systran Processing Steps • Analysis • Lookup • Compound Decomposition • Disambiguation • Syntactic Analysis • Compound Expansion • Sentence Transfer • Initial Target Structure • Lookup • Default Transfer of Attributes • Structure Transformation Anna Sågvall Hein, GSLT, September 2004
Systran Processing Steps (cont) • Sentence Synthesis • Structure Transformation • Inflection lookup • Surface Transformation Anna Sågvall Hein, GSLT, September 2004
Motivations for transfer-based translation • lexical ambiguity • structural differences See further Ingo 91 Anna Sågvall Hein, GSLT, September 2004
Example 1 Sv. Fyll på olja i växellådan. En. Fill gearbox with oil. (from the Scania corpus) • fyll på fill • obj adv • adv obj Anna Sågvall Hein, GSLT, September 2004
Example 2 Sv. I oljefilterhållaren sitter en överströmningsventil. En. The oil filter retainer has an overflow valve. (from the Scania corpus) • sitter has • adv subj • subj obj Anna Sågvall Hein, GSLT, September 2004
Transfer-based translation • intermediary sentence structure • basic processes • analysis • transfer • generation (synthesis) • language modules • dictionary and grammar of SL • transfer dictionary and transfer rules • dictionary and grammar of TL Anna Sågvall Hein, GSLT, September 2004
Direct translation SL TL Metal Transfer Multra Interlingua Anna Sågvall Hein, GSLT, September 2004
Levels of intermediary structure • cf. J&M, Chapter 21 • word order Anna Sågvall Hein, GSLT, September 2004
Metal • See H&S Anna Sågvall Hein, GSLT, September 2004
MULTRA Multilingual Support for Translation and Writing • translation engine • transfer-based • shake-and-bake • modular • unification-based • preference machinery • trace-able Anna Sågvall Hein, GSLT, September 2004
Analysis • chart parser (Lisp C) • procedural formalism • unification and other kinds of operations • sentence structure • feature structure • grammatical relations • surface order implicit via grammatical relations See further Sågvall Hein&Starbäck (99),Weijnitz (02), Dahllöf (89) Anna Sågvall Hein, GSLT, September 2004
Transfer • unification-based • declarative formalism • Multra transfer formalism (Beskow 93) • lexical and structural rules • rules are partially ordered • a more specific rule takes precedence over a less specific one • specificity in terms of number of transfer equations • all applicable rules are applied • written in prolog Anna Sågvall Hein, GSLT, September 2004
Generation • syntactic generation • Multra syntactic generation formalism (Beskow 97a) • PATR-like style • unification • concatenation • typed features • morphological generation (Beskow 97b) • lexical insertion rules • morphological realisation and phonological finish in prolog • written in prolog Anna Sågvall Hein, GSLT, September 2004