450 likes | 543 Views
Machine Translation MÖSG vt 2004. Anna Sågvall Hein. Can computers translate?. Not a simple yes or no depends on the text the purpose of the translation the required quality. Classical problems with MT. unrealistic expectations bad translations
E N D
Machine TranslationMÖSG vt 2004 Anna Sågvall Hein
Can computers translate? • Not a simple yes or no • depends on the text • the purpose of the translation • the required quality @Anna Sågvall Hein, MÖSG 2004
Classical problems with MT • unrealistic expectations • bad translations • difficulties in integrating MT in the work flow • the Ericsson case @Anna Sågvall Hein, MÖSG 2004
What is MT proper? • To be considered as MT, a system should provide • mininally correct morphology • minimal syntactic processing • minimal semantic processing • handle and produce full sentences • Hutchins, J., 2000, The IAMT Certification initiative and defining translation system categories (http://ourworld.compuserve.com/homepages/WJHutchins/IAMTcert.htm) @Anna Sågvall Hein, MÖSG 2004
Basic translation strategies • direct translation • transfer-based translation • statistical translation • combined strategies @Anna Sågvall Hein, MÖSG 2004
Direct translation, 1 • no intermediary sentence structure • the most important language component is a translation dictionary • translation proceeds mostly word by word, or phrase by phrase • translation problems are handled more or less case by case by means of specific rules @Anna Sågvall Hein, MÖSG 2004
Direct translation, 2 • quality • typically browsing quality • depends on • the quality of the translation dictionary • the coverage of the translation rules • editing quality may be achieved • problems with • ambiguity • inflection • word order • structural differences @Anna Sågvall Hein, MÖSG 2004
Advanced classical approach (Tucker 1987) • source text dictionary lookups and morphological analysis • identification of homographs • identification of compounds • identification of nouns and verb phrases • processing of idioms @Anna Sågvall Hein, MÖSG 2004
Advanced approach, cont. • processing of prepositions • subject-predicate identification • syntactic ambiguity identification • synthesis and morphological processing of target text • rearrangement of words and phrases in target text @Anna Sågvall Hein, MÖSG 2004
Feasibility of the direct translation strategy • Is it possible to carry out the direct translation steps as suggested by Tucker with sufficient precision without relying on a sentence grammar and an intermediary structure? @Anna Sågvall Hein, MÖSG 2004
SYSTRAN • SYStem TRANslation • developped in the US by Peter Toma • first version 1969 (Ru-En) • EC bought the rights of Systran in 1976 • Systran SA, France, is the current owner of the rights of Systran • currently 18 language pairs, excl. Swedish • Swedish-->English is being introduced, starting in June 2004 • (http://babelfish.altavista.com/) @Anna Sågvall Hein, MÖSG 2004
Systran, cont. • more than 1,600,000 dictionary units • 20 domain dictionaries • daily use by EC translators, administrators of the European institutions • originally a direct translation strategy • see H&S • to-day more of a transfer-based strategy @Anna Sågvall Hein, MÖSG 2004
Ex. 1: fairly good translation /Systran sv-en • "Enskilda företagare som inte bildat bolag klassificeras hit." • "Individual entrepreneurs that have not formed companies are classified here.” • Systemet har känt igen bildat som en perfektform och översätter tempusformen korrekt have formed med negationen not på rätt plats. @Anna Sågvall Hein, MÖSG 2004
Ex. 2: word order problem/ Systran sv-en • "När byarna kontaktades hade de inte ens utsatts för influensa." • "When the villages were contacted had they not even been exposed to flu.” • Systemet har inte hittat subjekt och predikat och ger därför fel ordföljd. @Anna Sågvall Hein, MÖSG 2004
Ex. 3: ambiguity problem/ Systran sv-en • "Vad kan vi lära av Arrawetestammen?" • "What can we faith of the Arawete?” • Systemet hittar inte sambandet mellan kan och lära och ser därför inte att lära är ett verb. @Anna Sågvall Hein, MÖSG 2004
Ex. 4: ambiguity problem/ Systran sv-en • ”Extrapoleringen går till så här. " • ”The extrapolation goes to so here.” • Systemet känner inte till partikelverbet känna till och översätter därför felaktigt ord för ord. @Anna Sågvall Hein, MÖSG 2004
Motivations for transfer-based translation • lexical ambiguity • structural differences • See further Ingo 91 (6), Wikholm (89) @Anna Sågvall Hein, MÖSG 2004
Transfer-based translation,1 • intermediary sentence structure • provides a basis for the systematic handling of grammatical problems and lexical choices • basic processes • analysis • transfer • generation (synthesis) @Anna Sågvall Hein, MÖSG 2004
Transfer-based translation, 2 • knowledge-intensive • language modules • dictionary and grammar of source language • transfer dictionary and transfer rules • dictionary and grammar of target language @Anna Sågvall Hein, MÖSG 2004
Multra • transfer-based translation engine • high quality • focus on restricted domains • developped at Uppsala University @Anna Sågvall Hein, MÖSG 2004
Multra formalisms • intermediary structure • feature structure • grammatical function & constituency • analysis grammar • procedural • transfer • unification based (Beskow 93) • synthesis • PATR-like style (Beskow 93) @Anna Sågvall Hein, MÖSG 2004
Simplistic approach • sentence splitting • tokenisation • handling capital letters • dictionary look-up and lexical substitution • copying unknown words, digits, signs of punctuation etc. • formal editing @Anna Sågvall Hein, MÖSG 2004
Ex. 1: Multra • Sv. I oljefilterhållaren sitter en överströmningsventil. • En. The oil filter retainer has an overflow valve. • (from the Scania corpus) • sitter has • adv subj • subj obj @Anna Sågvall Hein, MÖSG 2004
Ex. 2 • Sv. Fyll på olja i växellådan. • En. Fill gearbox with oil. • (from the Scania corpus) • fyll på fill • obj adv • adv obj @Anna Sågvall Hein, MÖSG 2004
Ex. 3: Multra • Detta filter ska bytas med jämna mellanrum. • This filter must be renewed at regular intervals. • Lexical choices in the context • ska - must • byta –renew • med - at • jämna – regular • mellanrum - interval @Anna Sågvall Hein, MÖSG 2004
Ex. 4: Multra • Beskrivningen gäller för automatväxellådor med beteckning ZF 4/HP500, 590 och 600. • The description applies to automatic gearboxes with the designations ZF 4/5HP500, 590 and 600. • gäller – applies to • beteckning – the designations @Anna Sågvall Hein, MÖSG 2004
Feasibility of machine translation • Re-use of translations • Quality in relation to purpose • Sublanguage • Spell checked and grammar checked SL • Controlled language • Human machine interaction • Evalution data and criteria @Anna Sågvall Hein, MÖSG 2004
Re-use of previous translations • translation memories • translation dictionaries • statistical machine translation @Anna Sågvall Hein, MÖSG 2004
Re-use techniques,1 • sentence alignment • linking source and target sentences pairwise • success rate close to 100 % • translation memories @Anna Sågvall Hein, MÖSG 2004
Re-use techniques, 2 • word alignment • linking sub-sentence segments, typically, source and target words and phrases pairwise • large-scale processing • success rate close to 80 % • translation dictionaries • statistical machine translation @Anna Sågvall Hein, MÖSG 2004
A word alignment example • Jag tar mittplatsen, som jag inte tycker om. • I take the middle seat, which I dislike. • jag – I • tar – take • mittplatsen – the middle seat • som – which • jag – I • inte tycker om – dislike • (from Tiedemann 2003) @Anna Sågvall Hein, MÖSG 2004
Statistical machine translation • large scale word alignment • raw translation dictionary • direct translation using the dictionary • no translation rules • smoothing the translation by means of a language model • statistically based • decoding algorithm cruical • arabic – english • hindi - english @Anna Sågvall Hein, MÖSG 2004
Quality • publishing quality • high quality translation, good enough for publishing, typically, after inspection and minor editing • browsing quality • low quality translation, comprehensible, typically, not good enough for editing and publishing, may contain grammatical errors, errors in word order, and wrong words @Anna Sågvall Hein, MÖSG 2004
Translation purposes • translation • publishing quality • browsing • browsing quality • gisting • browsing quality • drafting • publishing/browsing quality? • cross-language information retrieval • browsing quality @Anna Sågvall Hein, MÖSG 2004
MT as a cross-language communication tool • MT is used not only for pure translation purposes but also for writing in a foreign language and for browsing (Hutchins 2001) • Hutchins, J., 2001, Towards a new vision for MT, Introductory speech at MT Summit VIII conference, 18-22 September 2001 • (http://ourworld.compuserve.com/homepages/WJHutchins/MTS-2001.htm) @Anna Sågvall Hein, MÖSG 2004
Restrictions on the input language • sublanguage • text type • domain • controlled language • spell checked • grammar checked @Anna Sågvall Hein, MÖSG 2004
Typically • general language – browsing quality • restricted language – high quality @Anna Sågvall Hein, MÖSG 2004
Spell checking and grammar checking • If there are spelling errors or typos in the SL dictionary search will fail • If there are grammatical errors in the SL grammatical analysis will fail • Where and how should spell and grammar checking be accounted for? Before or during the process? @Anna Sågvall Hein, MÖSG 2004
Controlled language • controlled vocabulary • full lexical coverage, e.g. Scania Swedish • controlled grammar • full grammatical coverage • language checker • e.g. Scania Checker @Anna Sågvall Hein, MÖSG 2004
Human intervention • before • language checking • during • e.g. ambiguity resolution • after • post-editing @Anna Sågvall Hein, MÖSG 2004
Evaluation of MT • coverage (recall) • quality (precision) @Anna Sågvall Hein, MÖSG 2004
Current trends in direct translation • re-use of translations • translation memories of sentences and sub-sentence units such as words, phrases and larger units • example-based translation • statistical translation • Will re-use of translations overcome the problems with the direct translation approach that were discussed above? • If so, how can the problems be handled? @Anna Sågvall Hein, MÖSG 2004
Why machine translation? • cheaper • faster • more consequent • when it succeeds .. @Anna Sågvall Hein, MÖSG 2004
Assignment: Hable Con Ella (en-sv) • Make a general quality assessment of the translation. • Suggest a possible use of a translation of this kind. • Identify the steps that were taken in the translation. • Specify the translation errors that were made and discuss them. • Suggest improvements in the framework of the direct translation strategy. • Motivate them. • Formalise them in a framework of your own choice. • Discuss their general adequacy in the translation of Swedish to English. @Anna Sågvall Hein, MÖSG 2004