1 / 45

Machine Translation MÖSG vt 2004

Machine Translation MÖSG vt 2004. Anna Sågvall Hein. Can computers translate?. Not a simple yes or no depends on the text the purpose of the translation the required quality. Classical problems with MT. unrealistic expectations bad translations

hilde
Download Presentation

Machine Translation MÖSG vt 2004

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine TranslationMÖSG vt 2004 Anna Sågvall Hein

  2. Can computers translate? • Not a simple yes or no • depends on the text • the purpose of the translation • the required quality @Anna Sågvall Hein, MÖSG 2004

  3. Classical problems with MT • unrealistic expectations • bad translations • difficulties in integrating MT in the work flow • the Ericsson case @Anna Sågvall Hein, MÖSG 2004

  4. What is MT proper? • To be considered as MT, a system should provide • mininally correct morphology • minimal syntactic processing • minimal semantic processing • handle and produce full sentences • Hutchins, J., 2000, The IAMT Certification initiative and defining translation system categories (http://ourworld.compuserve.com/homepages/WJHutchins/IAMTcert.htm) @Anna Sågvall Hein, MÖSG 2004

  5. Basic translation strategies • direct translation • transfer-based translation • statistical translation • combined strategies @Anna Sågvall Hein, MÖSG 2004

  6. Direct translation, 1 • no intermediary sentence structure • the most important language component is a translation dictionary • translation proceeds mostly word by word, or phrase by phrase • translation problems are handled more or less case by case by means of specific rules @Anna Sågvall Hein, MÖSG 2004

  7. Direct translation, 2 • quality • typically browsing quality • depends on • the quality of the translation dictionary • the coverage of the translation rules • editing quality may be achieved • problems with • ambiguity • inflection • word order • structural differences @Anna Sågvall Hein, MÖSG 2004

  8. Advanced classical approach (Tucker 1987) • source text dictionary lookups and morphological analysis • identification of homographs • identification of compounds • identification of nouns and verb phrases • processing of idioms @Anna Sågvall Hein, MÖSG 2004

  9. Advanced approach, cont. • processing of prepositions • subject-predicate identification • syntactic ambiguity identification • synthesis and morphological processing of target text • rearrangement of words and phrases in target text @Anna Sågvall Hein, MÖSG 2004

  10. Feasibility of the direct translation strategy • Is it possible to carry out the direct translation steps as suggested by Tucker with sufficient precision without relying on a sentence grammar and an intermediary structure? @Anna Sågvall Hein, MÖSG 2004

  11. SYSTRAN • SYStem TRANslation • developped in the US by Peter Toma • first version 1969 (Ru-En) • EC bought the rights of Systran in 1976 • Systran SA, France, is the current owner of the rights of Systran • currently 18 language pairs, excl. Swedish • Swedish-->English is being introduced, starting in June 2004 • (http://babelfish.altavista.com/) @Anna Sågvall Hein, MÖSG 2004

  12. Systran, cont. • more than 1,600,000 dictionary units • 20 domain dictionaries • daily use by EC translators, administrators of the European institutions • originally a direct translation strategy • see H&S • to-day more of a transfer-based strategy @Anna Sågvall Hein, MÖSG 2004

  13. Ex. 1: fairly good translation /Systran sv-en • "Enskilda företagare som inte bildat bolag klassificeras hit."  • "Individual entrepreneurs that have not formed companies are classified  here.” • Systemet har känt igen bildat som en perfektform och översätter tempusformen korrekt have formed med negationen not på rätt plats. @Anna Sågvall Hein, MÖSG 2004

  14. Ex. 2: word order problem/ Systran sv-en • "När byarna kontaktades hade de inte ens utsatts för influensa."  • "When the villages were contacted had they not even been exposed to flu.” • Systemet har inte hittat subjekt och predikat och ger därför fel ordföljd. @Anna Sågvall Hein, MÖSG 2004

  15. Ex. 3: ambiguity problem/ Systran sv-en • "Vad kan vi lära av Arrawetestammen?"  • "What can we faith of the Arawete?” • Systemet hittar inte sambandet mellan kan och lära och ser därför inte att lära är ett verb. @Anna Sågvall Hein, MÖSG 2004

  16. Ex. 4: ambiguity problem/ Systran sv-en • ”Extrapoleringen går till så här. "  • ”The extrapolation goes to so here.” • Systemet känner inte till partikelverbet känna till och översätter därför felaktigt ord för ord. @Anna Sågvall Hein, MÖSG 2004

  17. Motivations for transfer-based translation • lexical ambiguity • structural differences • See further Ingo 91 (6), Wikholm (89) @Anna Sågvall Hein, MÖSG 2004

  18. Transfer-based translation,1 • intermediary sentence structure • provides a basis for the systematic handling of grammatical problems and lexical choices • basic processes • analysis • transfer • generation (synthesis) @Anna Sågvall Hein, MÖSG 2004

  19. Transfer-based translation, 2 • knowledge-intensive • language modules • dictionary and grammar of source language • transfer dictionary and transfer rules • dictionary and grammar of target language @Anna Sågvall Hein, MÖSG 2004

  20. Multra • transfer-based translation engine • high quality • focus on restricted domains • developped at Uppsala University @Anna Sågvall Hein, MÖSG 2004

  21. @Anna Sågvall Hein, MÖSG 2004

  22. Multra formalisms • intermediary structure • feature structure • grammatical function & constituency • analysis grammar • procedural • transfer • unification based (Beskow 93) • synthesis • PATR-like style (Beskow 93) @Anna Sågvall Hein, MÖSG 2004

  23. Simplistic approach • sentence splitting • tokenisation • handling capital letters • dictionary look-up and lexical substitution • copying unknown words, digits, signs of punctuation etc. • formal editing @Anna Sågvall Hein, MÖSG 2004

  24. Ex. 1: Multra • Sv. I oljefilterhållaren sitter en överströmningsventil. • En. The oil filter retainer has an overflow valve. • (from the Scania corpus) • sitter  has • adv  subj • subj  obj @Anna Sågvall Hein, MÖSG 2004

  25. Ex. 2 • Sv. Fyll på olja i växellådan.  • En. Fill gearbox with oil. • (from the Scania corpus) • fyll på  fill • obj  adv • adv  obj @Anna Sågvall Hein, MÖSG 2004

  26. Ex. 3: Multra • Detta filter ska bytas med jämna mellanrum. • This filter must be renewed at regular intervals. • Lexical choices in the context • ska - must • byta –renew • med - at • jämna – regular • mellanrum - interval @Anna Sågvall Hein, MÖSG 2004

  27. Ex. 4: Multra • Beskrivningen gäller för automatväxellådor med beteckning ZF 4/HP500, 590 och 600.  • The description applies to automatic gearboxes with the designations ZF 4/5HP500, 590 and 600. • gäller – applies to • beteckning – the designations @Anna Sågvall Hein, MÖSG 2004

  28. Feasibility of machine translation • Re-use of translations • Quality in relation to purpose • Sublanguage • Spell checked and grammar checked SL • Controlled language • Human machine interaction • Evalution data and criteria @Anna Sågvall Hein, MÖSG 2004

  29. Re-use of previous translations • translation memories • translation dictionaries • statistical machine translation @Anna Sågvall Hein, MÖSG 2004

  30. Re-use techniques,1 • sentence alignment • linking source and target sentences pairwise • success rate close to 100 % • translation memories @Anna Sågvall Hein, MÖSG 2004

  31. Re-use techniques, 2 • word alignment • linking sub-sentence segments, typically, source and target words and phrases pairwise • large-scale processing • success rate close to 80 % • translation dictionaries • statistical machine translation @Anna Sågvall Hein, MÖSG 2004

  32. A word alignment example • Jag tar mittplatsen, som jag inte tycker om. • I take the middle seat, which I dislike. • jag – I • tar – take • mittplatsen – the middle seat • som – which • jag – I • inte tycker om – dislike • (from Tiedemann 2003) @Anna Sågvall Hein, MÖSG 2004

  33. Statistical machine translation • large scale word alignment • raw translation dictionary • direct translation using the dictionary • no translation rules • smoothing the translation by means of a language model • statistically based • decoding algorithm cruical • arabic – english • hindi - english @Anna Sågvall Hein, MÖSG 2004

  34. Quality • publishing quality • high quality translation, good enough for publishing, typically, after inspection and minor editing • browsing quality • low quality translation, comprehensible, typically, not good enough for editing and publishing, may contain grammatical errors, errors in word order, and wrong words @Anna Sågvall Hein, MÖSG 2004

  35. Translation purposes • translation • publishing quality • browsing • browsing quality • gisting • browsing quality • drafting • publishing/browsing quality? • cross-language information retrieval • browsing quality @Anna Sågvall Hein, MÖSG 2004

  36. MT as a cross-language communication tool • MT is used not only for pure translation purposes but also for writing in a foreign language and for browsing (Hutchins 2001) • Hutchins, J., 2001, Towards a new vision for MT, Introductory speech at MT Summit VIII conference, 18-22 September 2001 • (http://ourworld.compuserve.com/homepages/WJHutchins/MTS-2001.htm) @Anna Sågvall Hein, MÖSG 2004

  37. Restrictions on the input language • sublanguage • text type • domain • controlled language • spell checked • grammar checked @Anna Sågvall Hein, MÖSG 2004

  38. Typically • general language – browsing quality • restricted language – high quality @Anna Sågvall Hein, MÖSG 2004

  39. Spell checking and grammar checking • If there are spelling errors or typos in the SL dictionary search will fail • If there are grammatical errors in the SL grammatical analysis will fail • Where and how should spell and grammar checking be accounted for? Before or during the process? @Anna Sågvall Hein, MÖSG 2004

  40. Controlled language • controlled vocabulary • full lexical coverage, e.g. Scania Swedish • controlled grammar • full grammatical coverage • language checker • e.g. Scania Checker @Anna Sågvall Hein, MÖSG 2004

  41. Human intervention • before • language checking • during • e.g. ambiguity resolution • after • post-editing @Anna Sågvall Hein, MÖSG 2004

  42. Evaluation of MT • coverage (recall) • quality (precision) @Anna Sågvall Hein, MÖSG 2004

  43. Current trends in direct translation • re-use of translations • translation memories of sentences and sub-sentence units such as words, phrases and larger units • example-based translation • statistical translation • Will re-use of translations overcome the problems with the direct translation approach that were discussed above? • If so, how can the problems be handled? @Anna Sågvall Hein, MÖSG 2004

  44. Why machine translation? • cheaper • faster • more consequent • when it succeeds .. @Anna Sågvall Hein, MÖSG 2004

  45. Assignment: Hable Con Ella (en-sv) • Make a general quality assessment of the translation. • Suggest a possible use of a translation of this kind. • Identify the steps that were taken in the translation. • Specify the translation errors that were made and discuss them. • Suggest improvements in the framework of the direct translation strategy. • Motivate them. • Formalise them in a framework of your own choice. • Discuss their general adequacy in the translation of Swedish to English. @Anna Sågvall Hein, MÖSG 2004

More Related