360 likes | 482 Views
The Link between Controlled Language and Post-Editing:. An Empirical Investigation of Technical, Temporal and Cognitive Effort Sharon O’Brien, CTTS/SALIS. Overview. Research Parameters Temporal Effort Technical Effort Cognitive Effort Conclusions. Definition.
E N D
The Link between Controlled Language and Post-Editing: An Empirical Investigation of Technical, Temporal and Cognitive Effort Sharon O’Brien, CTTS/SALIS
Overview • Research Parameters • Temporal Effort • Technical Effort • Cognitive Effort • Conclusions
Definition • an explicitly defined restriction of a natural language that specifies constraints on lexicon, grammar, and style. (Huijsen, 1998: 2)
Motivation – In a Nutshell • Can the introduction of CL rules really improve MT output such that post-editing effort is reduced?
Machine “Translatability” • One of the main “goals” of CL • The notion of translatability is based on so-called "translatability indicators" where the occurrence of such an indicator in the text is considered to have a negative effect on the quality of machine translation. The fewer translatability indicators, the better suited the text is to translation using MT. (Underwood and Jongejan 2001: 363)
Machine “Translatability” • “Negative” Translatability Indicators • “NTIs” for short • Examples (for English as SL) • Long noun phrases • Passive voice • Ungrammatical constructs • Use of slang… • Use of NTI list (Bernth/Gdaniec 2001) • Use of term “minimal NTI”
Research Design • SL: English; TL: German • Text Type: User Manual (1 777 words) • Users: 12 Professional Translators • Tools: IBM Websphere, Translog, IBM’s EasyEnglishAnalyzer, Sun Microsystem’s Sunproof • Place of Data Capture: IBM Stuttgart
Methodology • Edit SL text to create two sentence types: • S(nti) = sentences with known negative translatability indicators • S(min-nti) = sentences where all listed NTIs had been removed • 9 subjects: post-editing (P1-P9) • 3 subjects: translating (T1-T3) • First pass exercise, no QA
Temporal Effort • Post-Editing vs. Translation • median words per minute
Temporal Effort (2) • Post-Editing vs. Translation • median processing speed • Processing speed is the total number of source words in each segment divided by the total processing time for that segment • i.e. words processed per second
Median Processing Speed • S(ntis) vs. S(min-ntis)
Temporal Effort: Conclusions • The post-editing task was completed faster than the translation task. • First-pass exercise/No QA • The median processing speeds for S(min-nti) segments were significantly higher than S(nti) segments • So, from a temporal point of view, it seems that the introduction of CL benefits turnaround times
Technical Effort • Measured using Translog: • Keyboarding • Deletions, insertions, cuts, pastes • Dictionary Look-Up Activity
Keyboarding Median Measurements • Small difference between the two segment types, but statistically significant for insertions/deletions • Cutting and pasting: very limited even though post-editors recycled whole chunks of text
Use of the Translog Dictionary • Training and practice prior to task • All users reported being comfortable with the feature
Possible Explanations? • Subjects not as familiar with feature as they reported • Subjects felt it was unnecessary to use dictionary • Subjects used to having terms suggested on-screen with TM/Terminology tool • Subjects lost faith in the feature when they encountered problems
Conclusions on Technical Effort • S(min-nti) segments require significantly fewer deletions and insertions than S(nti) segments. • Cutting and pasting was a very rare activity for both segment types. • Dictionary searches were uncommon during this study. When they were carried out, the search facility was frequently used incorrectly.
Technical/Temporal Combined • Results on technical post-editing effort add to the evidence presented above on temporal post-editing effort and further supports the claim that the elimination of NTIs from a segment can reduce post-editing effort.
Cognitive Effort • Potential Methodologies • TAP (rejected) • Pause Analysis • Choice Network Analysis • Eye tracking (unavailable at the time)
Pause Behaviour • No discernible correlations between pause behaviour and post-editing activity • Pause analysis rejected
Cognitive Effort • Choice Network Analysis
Choice Network Analysis • …Choice Network Analysis compares the renditions of a single string of translation by multiple translators in order to propose a network of choices that theoretically represents the cognitive model available to any translator for translating that string. The technique is favoured over the think-aloud method, which is acknowledged as not being able to access automaticized processes. (Campbell, 2000: 215)
Example – Sentence with NTIs • ST: • “Save the document(s).” • Raw MT output: • „Sichern Sie das Dokument(s).“ • NTIs for this sentence: • Short segment • Use of “(s)” for plural
Example – Sentence with minimal NTIs • ST: • “The editor contains a menu and a toolbar.” • Raw MT output: • „Der Editor enthält ein Menü und eine Symbolleiste.“
NTIs and Cognitive Effort • Using CNA as a guide, NTIs categorised into: • High impact on post-editing effort • 50% or more of the occurrences of the NTI resulted in post-editing by two or more post-editors • Moderate impact on post-editing effort • Between 31% and 49% of occurrences • Low impact on post-editing effort • 30% or fewer occurrences
Correlating Measurements • By combining data on temporal, technical and cognitive effort: High Impact NTIs • Use of the gerund • Proper nouns • Problematic punctuation • Ungrammatical constructs • Use of (s) for plural • Non-finite verbs • Incomplete syntactic unit • Long NP • Short segment
Correlating Measurements • Moderate impact NTIs: • Multiple coordinators • Passive voice • Personal pronouns • Use of a slash as a separator • Ambiguous scope in coordination • Parentheses
Correlating Measurements • Low impact NTIs: • Abbreviations • Demonstrative pronouns • Missing “in order to” • Contractions
Conclusion • Within the limited scope of this research, we now have empirical evidence to support the assertion that controlling the input to MT leads to lower post-editing effort. • The elimination of some NTIs can have a higher impact than other NTIs • Is it worth having a relatively high number of CL rules? • Even if we remove known NTIs, MT engines are still likely to produce some errors and post-editors are still likely to post-edit.