330 likes | 424 Views
May 26 th 2014. Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy. Ekaterina Stambolieva e katerina.stambolieva@euroscript.lu. Outline. Why? MT Adequacy? What? Evaluation Findings Conclusion & Future Work. WHY?. impending industry problem:.
E N D
May 26th 2014 Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy Ekaterina Stambolieva ekaterina.stambolieva@euroscript.lu
Outline Why? MT Adequacy? What? Evaluation Findings Conclusion & Future Work
WHY? • impending industry problem: MTE, May 26th 2014
WHY? • impending industry problem: How do we compare MT systems over time? MTE, May 26th 2014
WHY? • impending industry problem: • We measure MT quality continuously How do we compare MT systems over time? MTE, May 26th 2014
WHY? • impending industry problem: • We measure MT quality continuously How do we compare MT systems over time? BLEU? MTE, May 26th 2014
WHY? • impending industry problem: • We measure MT quality continuously How do we compare MT systems over time? We want adequate translations BLEU? MTE, May 26th 2014
Outline Why? MT Adequacy? What? Evaluation Findings Conclusion & Future Work
ADEQUACY How do we define MT adequacy in business? MTE, May 26th 2014
ADEQUACY How do we define MT adequacy in business? accelerate time-to-delivery reduce translation costs achieve near-native fluency MTE, May 26th 2014
ADEQUACY adequacy MTE, May 26th 2014
ADEQUACY adequacy improving MT output’s acceptance for the task of post-editing MTE, May 26th 2014
WHAT • We aim at evaluating our MT systems continuously and compare results over time MTE, May 26th 2014
WHAT • We aim at evaluating our MT systems continuously and compare results over time • We design our system’s improvements based on human end-user feedback MTE, May 26th 2014
WHAT • We aim at evaluating our MT systems continuously and compare results over time • We design our system’s improvements based on human end-user feedback • We do not directly evaluate translation quality, instead we assesses over-time MT output improvement MTE, May 26th 2014
WHAT • We aim at evaluating our MT systems continuously and compare results over time • We design our system’s improvements based on human end-user feedback • We do not directly evaluate translation quality, instead we assesses over-time MT output improvement • no annotation effort required MTE, May 26th 2014
Outline Why? MT Adequacy? What? Evaluation • Edit Distance Findings Conclusion & Future Work
THE EXAMPLE • We compare the results of 2 MT English<->Danish systems MTE, May 26th 2014
THE EXAMPLE • We compare the results of 2 MT English<->Danish systems BLEU 1 2 EN->DA 59.22 DA->EN 64.26 MTE, May 26th 2014
THE EXAMPLE • We compare the results of 2 MT English<->Danish systems BLEU 1 2 EN->DA 59.22 58.84 DA->EN 64.26 63.98 MTE, May 26th 2014
CATEGORIES • 3 objective categories to evaluate MT output • Does the MT output look better than before? • Does the MT output look worse than before? • Is it difficult for you to judge whether the MT output is better or not? MTE, May 26th 2014
EVALUATION • We will present MT output evaluation based on the Edit Distance (ED) score MTE, May 26th 2014
EVALUATION • We will present MT output evaluation based on the Edit Distance (ED) score We compute in how many edits MT output transforms into the human translation segment based on the same source MTE, May 26th 2014
Outline Why? MT Adequacy? What? Evaluation Findings Conclusion & Future Work
FINDINGS MTE, May 26th 2014
FINDINGS MTE, May 26th 2014
FINDINGS Improved MT acceptance for the task of post-editing MTE, May 26th 2014
FINDINGS Length variance comparison between MT output with the new and old system does not affect MT acceptance MTE, May 26th 2014
Outline Why? MT Adequacy? What? Evaluation Findings Conclusion & Future Work
FUTURE WORK • Modify ED to take into consideration the number of UNK words • Modify the metric so that it detects small improvements in the system • such as number isolation • tag protection • Take segment character length into consideration • So not to penalize too much shorter segments MTE, May 26th 2014
FUTURE WORK • Modify ED to take into consideration the number of UNK words • Modify the metric so that it detects small improvements in the system • such as number isolation • tag protection • Take segment character length into consideration • So not to penalize too much shorter segments MTE, May 26th 2014
FUTURE WORK • Modify ED to take into consideration the number of UNK words • Modify the metric so that it detects small improvements in the system • such as number isolation • tag protection • Take segment character length into consideration • So not to penalize too much shorter segments MTE, May 26th 2014
Thank you MTE, May 26th 2014