180 likes | 424 Views
Machine Translation The Translator ’ s Choice. Heidi Düchting Sylke Krämer Johann Roturier. Outline . Background Challenges Solutions Benefits Next steps Conclusions. Commercial Imperatives. Effective Time-critical documents in volume Efficient Translation process automation
E N D
Machine Translation The Translator’s Choice Heidi Düchting Sylke Krämer Johann Roturier
Outline • Background • Challenges • Solutions • Benefits • Next steps • Conclusions
Commercial Imperatives • Effective • Time-critical documents in volume • Efficient • Translation process automation • Combining translation technologies • workflow • TM, MT, and PE tools • Control • Loose writing guidelines vs. Controlled Language rules • Improved machine translatability
Commercial Systems • Combine technologies • TM with previously machine-translated and post-edited segments for look-up • TM systems with MT component • Rule based and Example based • Pre-translate phase • Towards improved post-editing efficiency? • Not available in all systems • MT systems with TM component • 100 % match look-up
Challenges • Setting a threshold for TM matches • 100% matches only • suitable when the objective is to provide MT output for gisting (no post-editing) • suitable when the MT system is fully customized and CL environment is in place (no post-editing?) • Quick PE • New sentences in which only one character changes are sent to the MT engine • W32.Beagle.AB is a mass-mailing worm that neither propagates via network shares nor deletes files • W32.Beagle.AC is a mass-mailing worm that neither propagates via network shares nor deletes files
Solutions (1) • Two-tier process • Leverage Trados TM repository • Use MT system to translate unknown segments (Systran Premium 5.0) • Use MT output as TM input • Determine the export threshold • Existing TM segments vs. new controlled segments • Uncontrolled: Symantec announced a patch was available • CL: Symantec announced that a patch was available
Solutions (2) • TMX format • obvious choice as the exchange format • XLIFF not supported by all MT systems • source and target segments <tu usagecount="1" creationdate="20050301T122255Z" creationid="SUPER"> <tuv lang="EN-US"> <seg>Then the worm searches all local and network drives for .gif, .bmp, and .wav files.</seg> </tuv> <tuv lang="DE-DE"> <seg>Then the worm searches all local and network drives for .gif, .bmp, and .wav files.</seg> </tuv> </tu>
Processing TMX • Technical issues • TMX's various implementations can create discrepancies during the exchange process • Identical source and target segment • XML parser and TMX header • Pre and post processing with a single macro • Modules to remove and restitute sections • Environment: VBA
Pre-translation Workflow Step 1: Analyze new document Step 2: Export unmatched segments Step 3: Pre-processing module Step 4: Call to MT system Step 5: Post-processing module Step 6: Import segments into TM
Effective pre-translation • Efficiency and robustness • Refinable • Opportunity for modifications • Target segments • CL environment predictability • Frequent errors • Ideal scenario • Address problems that could not be fixed with CL rules
Towards Automated Post-Editing • Surface post-editing • No linguistic analysis: no second MT • Text processing • Frequent errors due to default MT settings • Remove drudgery from post-editing • Lexical • Capitalization (folgende vs. Folgende) • Incorrect spelling (neuzustarten vs. neu zu starten) • Missing contractions (à le vs. au) • Extra words (fichier de .bmp vs. fichier .bmp)
Towards Automated Post-Editing • Syntactic • Word order: “Klicken auf Sie” vs. “Klicken Sie auf” • Wrong structures (transfer or generation issue): neither…nor (ni ne..ni ne) • Textual • Formatting: trailing spaces after symbols (backslashes) • Punctuation inconsistent with style guide: inverted commas for German
Towards Automated Post-Editing • Suitability of the environment • Regular expressions support • RE are a ‘way to describe text through pattern matching’ (Stubblebine 2003: 1) • Grouping and Capturing: • Match: ([Kk]licken) (auf) (Sie) • Replace: \1 \3 \2
Next steps • New environment • GMS integration • Centralized interface with content • Transport layer • MT as plug-in • XLIFF format • To machine translate unmatched segments • PE replacements • Fine-tune contextual replacements
Conclusions • Combining MT & TM is efficient • leverage • post-editing is not repeated • increased throughput • Environment for avoiding errors • facilitated when CL rules are introduced • Scope of errors is reduced • New opportunities for translators • Fine-tuning MT user dictionaries • Refine automated PE tasks
Thank You johann_roturier@symantec.com