1 / 28

Translating Subtitles using Machine Translation Practices, Problems, Methodology

Translating Subtitles using Machine Translation Practices, Problems, Methodology Elsa Sklavounou, Ph. D. Linguist, Co-funded Projects Technical Coordinator SYSTRAN. A customization project involves three different customization levels that provide incremental higher translation quality:

shiri
Download Presentation

Translating Subtitles using Machine Translation Practices, Problems, Methodology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Translating Subtitles using Machine Translation Practices, Problems, Methodology Elsa Sklavounou, Ph. D. Linguist, Co-funded Projects Technical Coordinator SYSTRAN

  2. A customization project involves three different customization levels that provide incremental higher translation quality: Basic Terminology Complex Terminology Linguistic Rules SYSTRAN MT Customization MethodologyOverview

  3. Basic Terminology The first step entails the creation of a User Dictionary that covers most of the noun terminology in the corpus, and various simple adjective and verb terms. Complex Terminology The second level concerns the coding of complex terminological entries; such as the coding of complex verbs with their complements (subject, object…) and their translations. Linguistic Rules The third level involves language-specific code modifications in the SYSTRAN linguistic modules. SYSTRAN MT Customization MethodologyOverview

  4. Customization level 1 and 2 focuses on the implementation in the systems of specialized terminology from the corpus. Level 1 and 2 tasks include: Simple and complex terms extraction ; Simple and complex terms translations ; Simple and complex terms coding ; Simple and complex terms review ; SYSTRAN MT Customization MethodologyLevel 1 & Level 2

  5. Step 1: Corpus installation and analysis Prerequisite 1: a formatted corpus Step 2: Term extraction Simple terms (nouns and noun expressions) Complex terms (verb patterns) DNT (Do Not Translate) integration SYSTRAN MT Customization MethodologyLevel 1 & Level 2

  6. Customization level 3 focuses on the implementation of linguistic rules uniquely adapted to language-specific syntactic and semantic issues found in translations taken from the corpus. Level 3 tasks include: Detailed linguistic evaluations and the development of a comprehensive customization plan: Implementation of customized rules Regression tests Correction of linguistic translation errors Acceptance testing before release SYSTRAN MT Customization MethodologyLevel 3

  7. Estimate of the quality levels that may be achieved for each customization level. SYSTRAN MT Customization MethodologyQuality Levels

  8. The process for coding simple and complex terms and related dictionary maintenance is managed by the SYSTRAN Linguistics Platform that integrates the following two tools, required to complete customization levels 1 and 2. SYSTRAN MT Customization MethodologySoftware Tools

  9. SYSTRAN Dictionary Manager The SYSTRAN Dictionary Manager (SDM) enables translators to build and manage multilingual dictionaries. SDM includes preparation steps for dictionary coding tasks, an online dictionary lookup (via an HTML interface), and a compiler for runtime machine translation dictionaries. It is composed of three main components: a database, HTML query form (dictionary lookup, reports, logs, import and export) and a Windows client (interactive coding tool). SYSTRAN MT Customization MethodologySoftware Tools

  10. SYSTRAN Customization MethodologySoftware Tools • The SYSTRAN Review Manager (SRM) is a productivity tool used for • the review • quality assessment and • maintenance of linguistic resources used combined with a SYSTRAN system.

  11. Grammar Writing Rules Using Articles Avoiding Speech Ambiguity Using Enumeration Ensuring Subject-Verb Agreement Using Prepositions Using Infinitives at the Beginning of Sentences Using Imperatives Observing Punctuation Rules Using Main Clauses Using Subordinate Clauses Using Relative Clauses Avoiding Multiple Stacking Using Compound Words Using Capitalization Using Spelling Variations Lexical Ambiguities Disambiguation of Product Names and Menus Avoiding Lexical Ambiguities Using Compounds Format and Typographical Issues Segmentation SYSTRAN Customization MethodologyPrerequisite 1: a formatted grammatical corpus

  12. Two-process fully-automatically generated Corpus: Speech Recognition (KU Leuven), Automatic Sentence Compression (CNTS) First priority Subtitles Constraints Second Priority The least possible ambiguous content Lesson learned : No prerequisite SYSTRAN Customization Methodologyfor MUSA

  13. SYSTRAN MT Customization MethodologyUpgraded Software Tools (Client Tools v5)

  14. SYSTRAN Translation Project ManagerTerminology ReviewNot Found Words Extraction • Reviewing Terminology and Sentences • The Terminology Review tab in the Review window lets you identify expressions such as Not Found Words or Terminologyextracted by the software.

  15. SYSTRAN Translation Project ManagerTerminology ReviewNot Found Words ExtractionExamples • SRC_Id these parents know measles can be dangerous, but they don't want their child to have MMR, the triple vaccine which protects them from measles, mumps and rubella. • Raw MT ces parents savent la rougeole peut être dangereuse, mais ils ne veulent pas que leur enfant a MMR, le vaccin triple qui les protège contre la rougeole, les oreillons et la rubéole.

  16. SYSTRAN Translation Project ManagerAlternative Meanings • Alternative Meanings • shows alternative translations based on different meanings of a sourceword or expression. • The Alternative Meanings tab in the Review window shows alternative meanings for expressions in SYSTRANor User Dictionaries

  17. SYSTRAN Translation Project ManagerAlternative MeaningsExamples • SRC_Id they'd rather pay for single vaccines at 60 pounds a shot, even though the government insists MMR is safe. • Raw MT ils payeraient plutôt les vaccins uniques à 60 livres un coup de feu, quoique le gouvernement exige que MMR est sûr. • Customized MT ils payeraient plutôt les vaccins uniques à 60 livres une injection, quoique le gouvernement exige que MMR est sûr.

  18. SYSTRAN Dictionary Manager User Dictionaries (UDs) • User Dictionaries (UDs) let you increase the quality of source language analyses, which also increases the • translation output for all associated target languages. UDs can be used for a number of functions, including: • Automatically translating Not Found Words in the SYSTRAN dictionary. • Overriding the target-language meaning of a word or expression in the SYSTRAN dictionaries, a capability that lets you customize translation output to fit specific needs. • Ensuring that an expression is always treated as a unit by SYSTRAN analysis programs.

  19. SYSTRAN Dictionary Manager User Dictionaries (UDs)Metrics • Type of Dictionary • ENFR • ENEL • Do Not Translate Words • 3532 entries (enxx) • Proper Nouns • 1495 entries (enfr) • 1495 entries (enel) • MUSA Terminology • 1443 entries (enfr) • 5228 entries (enel)

  20. SYSTRAN Dictionary Manager User Dictionaries (UDs)Examples SRC_ID Andrew Wakefield ignited the debate over MMR by announcing the findings of research into a group with autism and bowel disease. • Raw MT Andrew Wakefield a enflammé la discussion au-dessus de MMR en annonçant les résultats de la recherche dans un groupe avec la maladie d'autism et d'entrailles. • Customized MT Andrew Wakefield a enflammé la discussion au-dessus de MMR en annonçant les résultats de la recherche dans un groupe avec autisme et maladie d'entrailles.

  21. SYSTRAN Translation Project Manager Source AnalysisInteractive Disambiguation • The Source Analysis tab in the Review window shows how the software handled source ambiguities andallows you to override the software selections.

  22. SYSTRAN Translation Project Manager Source AnalysisInteractive DisambiguationExamples • ID 523 At first we thought it was parts of the building but it was people, literally people falling all around us. • Raw MT D'abord nous avons pensé que ce faisait partie du bâtiment mais c'était les gens, peuplent littéralement la chute tout autour de nous. • Customized MT D’abord nous avons pensé que c’etait des fragments du bâtiment, mais c’était des gens, littéralement des gens qui tombaient autour de nous.

  23. SYSTRAN Dictionary Manager Normalization Dictionaries (NDs) • Normalization Dictionaries (NDs) There are two types of Normalization Dictionaries (NDs): source normalization and target normalization. • Source normalization normalizes source document before translation. • Target normalization adapts translation output to user needs in term of terminology consistency. It can also provide a way to replace expressions chosen by the software’s translation engine with user-defined expressions.

  24. SYSTRAN Dictionary Manager Normalization Dictionaries (NDs)Examples • SRC_IDs we did n't know she had measles but we do. I mean I ca n't help... • Raw MT nous avons fait le n't savons qu'il a eu la rougeole mais nous faisons. Je veux dire l'aide de n't d'I ca… • Customized MT via SRC Normalization nous n'avons pas su qu'il a eu la rougeole mais nous faisons.Je veux dire que je ne peux pas aider

  25. SYSTRAN Translation Project ManagerSentence Reviewfor Translation Memory Construction • The Sentence Review tab in the Review window compares sentences in the source and target. You canthen check the sentences you want to send to User Dictionaries, where you can work with them further in order to post-edit them and construct Translation Memories.

  26. SYSTRAN Dictionary Manager Translation Memories (TMs) • Translation Memory(TM) • A set of translated and validated sentences that can be integrated into thetranslation process.Translation Memories (TMs) are databases of aligned pre-translated sentences. • Unlike Dictionaries, TM entries can be formatted (for example, italic or bold) and are used by the translation engine to perform matches on full sentences in the source document. TMs are not usually created manually, but are built using SYSTRAN’s Translation Project Export or from TMX files.

  27. SYSTRAN Dictionary Manager Translation Memories (TMs)Examples • ID 370 Now people kind of started panicking and said we've got to leave no matter what. • Raw MT Maintenant sorte de personnes de panique commencée et dite nous avons pour laisser n'importe ce que. • Customized MT Les gens maintenant avaient l’air de paniquer disant qu’ils devaient à tout prix partir.

  28. SYSTRAN Dictionary Manager Translation Memories (TMs) • Translation MemoryImport/Export Already existent Tmx standard translation memory exchange files can be imported/exported via SYSTRAN Dictionary Manager .

More Related