360 likes | 932 Views
Rules Based Machine Translation. Fred Hollowood. Consultant. Sample Agenda. Introduction. 1. Rules Based Machine Translation. 2. Post-Editing. 3. Quality Measurement. 4. Controlled Language. 5. Introduction. The Aim
E N D
Rules Based Machine Translation Fred Hollowood Consultant RBMT and CL
Sample Agenda Introduction 1 Rules Based Machine Translation 2 Post-Editing 3 Quality Measurement 4 Controlled Language 5 RBMT and CL
Introduction • The Aim • Bring rapid, cost-effective translation to Symantec’s product and service divisions • Connect Symantec’s CMS to translation technologies • Metrics on the reduction of translation costs and time to market • The Approach • Structure source content so it accommodates MT • Use a language checker to monitor source grammar • Promote terminology as a key process and deliverable • Proactive rather than reactive • Define measures to monitor and drive productivity • GTM, Meteor, BLEU • Work with post-editors to ensure a win-win Technology Initiative - The Aim RBMT and CL
Rules Based Machine Translation Flowchart of Rule-Based Machine Translation (RBMT) TL Text SL Text Synthesis Transfer Analysis TL Lexicon & Grammars SL Lexicon & Grammars SL->TL Lexical & Structural Rules RBMT and CL
Remote Human Activity Text Processing Systran Engine System Control Phases MT Process Overview Controlled Language Authoring Automated Pre-processing User Dictionary Translation System Normalisation Dictionary Automated Post-processing Human Post-Editing RBMT and CL
Post-Editing • Fundamentally same relationship as with traditional vendor • Increased daily throughput expected for Post Edited content (6-8k Vs 2.5k p/day) • Style requirements have been critically reviewed in the light of PE • E.g. stylistic inconsistencies are acceptable for post-edited content RBMT and CL
Measurement RBMT and CL
Metrics based on Comprehensibility RBMT and CL
Quality by Human Inspection RBMT and CL
From the machine From the post-editor GTM Scoring RBMT and CL
Quality Metrics by Language Project Scores by Language French: 73% Spanish: 68% Italian: 59% German:57% RBMT and CL
Example Style rules • Avoid using a colon after a drive letter • Avoid “he”, “she”, “he/she”, and “s/he” • Use numerals for all measurements over 10 • Use the serial comma • Do not use more than two adverbs or adjectives in a series • Keep the subject and verb close to each other early in a sentence • Avoid meaningless openers • Avoid progressive tense when describing product use • Do not use future when describing product use • Make positive statements that tell users what to do or what they need to know • Use sentence-style capitalization for bulleted lists • Use a colon at the end of a sentence to introduce a bulleted list • Punctuate imperative sentences in bulleted lists • Use number × number • Use a hyphen in a unit • Repeat the unit of measure RBMT and CL
CL rules based on CDG • Avoid using the passive voice • Do not use more than 25 words in a sentence (original recommendation was 20) • Use relative pronouns • Use complementizers (“that”) • Avoid unnecessary words (such as “basic” or “just”) • Do not use 'this' or 'that' when they are not followed by a noun • Place all non-translatable text on its own line (programming code snippets) RBMT and CL
CL rules for MT • Do not use slashes to list lexical items • Do not write the full name of each operating system • Avoid –ing words • Use a noun at the start of subordinate clause • Repeat the head noun in ambiguous coordinated structures • Use a hyphen to indicate the first part of a compound • Use articles in specific contexts (for disambiguation) • Keep both parts of a two-part verb together • Use "could" with "if“ • Avoid parenthetical expressions in the middle of a sentence RBMT and CL
Examples of CL Violation • Keep both parts of a two-part verb together • This document gives directions to turn email scanning on or off. • Dieses Dokument gibt Richtungen zum Umdrehung E-Mail-Prüfung an oder weg. • Ce document donne des directions à l'analyse du courrier électronique de tour en fonction ou hors fonction. • This document gives directions to turn on or turn off email scanning. • Dieses Dokument gibt Richtungen, E-Mail-Prüfung zu aktivieren oder zu deaktivieren. • Ce document donne des directions pour activer ou désactiver l'analyse du courrier électronique. RBMT and CL
Lessons Learned • Strict implementation when there is: • New content • Little leverage • Time • Rules can be context-sensitive • Different results depending on client application • May not always flag tag problems • Language-specific rules • Probably best implemented as: • Pre-processing step • Normalization dictionaries • CL + MT is not sufficient • Terminology work to update dictionaries • PE when specific qualify standard is required RBMT and CL
Fred Hollowood fred@fredhollowoodconsulting.com RBMT and CL