1 / 26

User-focused task-oriented MT evaluation for wikis: a case study

User-focused task-oriented MT evaluation for wikis: a case study. Federico Gaspari , Antonio Toral, and Sudip Kumar Naskar. School of Computing Dublin City University Dublin 9, Ireland {fgaspari, atoral, snaskar}@computing.dcu.ie. Outline. Introduction: the CoSyne project Related work

dulcea
Download Presentation

User-focused task-oriented MT evaluation for wikis: a case study

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. User-focused task-oriented MT evaluationfor wikis: a case study Federico Gaspari, Antonio Toral, and Sudip Kumar Naskar School of Computing Dublin City University Dublin 9, Ireland {fgaspari, atoral, snaskar}@computing.dcu.ie Third Joint EM+ / CNGL Workshop, Luxembourg 14 October 2011

  2. Outline • Introduction: the CoSyne project • Related work • Evaluation • framework, scenario, questionnaire • Results and discussion • Conclusions • Future work Third Joint EM+ / CNGL Workshop, Luxembourg 14 October 2011

  3. Introduction: CoSyne • Aim: Synchronisation of multilingual wikis • Consortium • 7 partners from Germany, Italy, the Netherlands and Ireland • 3 academic partners • University of Amsterdam (UvA) • Fondazione Bruno Kessler (FBK) • Dublin City University (DCU) • 1 research organization • Heidelberg Institute for Theoretical Studies (HITS) • 3 end-users • Deutsche Welle (DW) • Netherlands Institute for Sound and Vision (NISV) • Vereniging Wikimedia Nederland (VWN) 3 Third Joint EM+ / CNGL Workshop, Luxembourg 14 October 2011

  4. Introduction: CoSyne • Techniques used by the CoSyne system: • MT • Textual entailment • Document structure modelling • Overlap synchronisation • Insertion point detection • CoSyne MT system developed by UvA (Martzoukos and Monz, 2010) • Language pairs covered in year 1: DE / IT / NL ↔ EN • Focus of this user evaluation • CoSyne MT software to translate wiki entries DE→EN and NL→EN 4 Third Joint EM+ / CNGL Workshop, Luxembourg 14 October 2011

  5. Related work • MT quality evaluation • fluency • adequacy • Automatic MT evaluation metrics, esp. for SMT (Toral et al., 2011) • BLEU (Papineni et al., 2002), METEOR (Banerjee & Lavie, 2005), etc. • no insight into the nature and severity of errors (e.g. for post-editing) • weak correlation with human judgement (Lin & Och, 2004) • Usefulness of MT output and users’ level of satisfaction • Post-editing • effort (e.g. Allen, 2003; O’Brien, 2007; Specia & Farzindar, 2010) • gains vs. translating from scratch (e.g. O’Brien, 2005; Specia 2011) 5 Third Joint EM+ / CNGL Workshop, Luxembourg 14 October 2011

  6. Evaluation framework • User-focused task-oriented evaluation of MT in/for wikis • in close collaboration with end-users (DW, NISV) • Accompanied by diagnostic evaluation • providing useful feedback to MT developers (UvA) • Pilot study conducted just before month 18 of 36-month project • full-scale final evaluation planned at the very end of the project 6 Third Joint EM+ / CNGL Workshop, Luxembourg 14 October 2011

  7. Evaluation scenario • Protocol for evaluation agreed between DCU and end-users • DW and NISV staff involved: editors, translators, project managers • German-English and Dutch-English as their working languages • final users of the CoSyne system for wiki content synchronization • Evaluation conducted on typical wiki entries for end-users • Users asked to focus only on linguistic quality and level of usefulness of MT (disregarding other components of the CoSyne system) 7 Third Joint EM+ / CNGL Workshop, Luxembourg 14 October 2011

  8. Evaluation scenario Deutsche Welle (DW): KalenderBlatt / Today in History 8 8 Third Joint EM+ / CNGL Workshop, Luxembourg 14 October 2011

  9. Evaluation scenario Netherlands Institute for Sound and Vision (NISV): wiki 9 9 Third Joint EM+ / CNGL Workshop, Luxembourg 14 October 2011

  10. Evaluation scenario Netherlands Institute for Sound and Vision (NISV): wiki 10 10 Third Joint EM+ / CNGL Workshop, Luxembourg 14 October 2011

  11. Evaluation scenario • Time-tracking system was implemented • Post-editing changes performed by the participants were logged • Before the evaluation • participants given presentation and demo of the CoSyne system • preliminary experimentation with the CoSyne system for 1-3 hours 11 Third Joint EM+ / CNGL Workshop, Luxembourg 14 October 2011

  12. Evaluation questionnaire • Written questionnaire administered on paper • available at http://www.computing.dcu.ie/~atoral/cosyne/quest.pdf • Questions grouped into 6 parts focusing on different aspects • Approximately 50 items using different formats • Likert scale, multiple choice and open questions • Part A: basic demographic information about the respondents • Part B: previous use of MT • Part C: users' evaluation of the CoSyne MT system • Part D: post-editing work • Part E: general comments and feedback • (Part F: usability and interaction design of the overall CoSyne system) 12 Third Joint EM+ / CNGL Workshop, Luxembourg 14 October 2011

  13. Results: demographics • 10 users: 6 from DW, 4 from NISV • 6 men and 4 women across DW and NISV • Variety of roles: editors, authors, translators and project managers • Average age: 34 (youngest 20, oldest 46) • Average work experience: just over 3 years (min. 3 months, max. 10 years) 13 Third Joint EM+ / CNGL Workshop, Luxembourg 14 October 2011

  14. Results: background • All (4) NISV staff were native speakers of Dutch • 5 DW users were German native speakers + 1 NS of Romanian fluent in German • 80% of the participants self-rated their knowledge of English as upper-intermediate, 20% defined it as intermediate or excellent • None of the respondents considered themselves bilingual 14 Third Joint EM+ / CNGL Workshop, Luxembourg 14 October 2011

  15. Results: previous use of MT • 80% had used MT before our experiment • 7 for personal reasons, 6 for work (commonly for both purposes) • all but one had used Google Translate, 1 had tried Babel Fish, 2 both • Language combinations used • 4 from EN into other languages • 6 into EN from a range of source languages • 5 language combinations not involving English • 75% used MT for assimilation purposes vs. 25% for dissemination • 62.5% had post-edited raw MT to obtain high-quality translations 15 Third Joint EM+ / CNGL Workshop, Luxembourg 14 October 2011

  16. Results: previous use of MT • Materials translated with MT by the 8 respondents • for study purposes (academic papers and uni-related texts): 3 • business correspondence, personal or professional emails: 2 • contracts and technical documents: 2 • online articles: 2 • websites: 2 (“the translations of Dutch sites to English were hilarious!”, but not using CoSyne MT system!!) • Wikipedia content: 1 16 Third Joint EM+ / CNGL Workshop, Luxembourg 14 October 2011

  17. Results: previous use of MT Quality of previously used MT systems on a 5-point scale (1 = very poor to 5 = very good) • Overall the 8 respondents had a predominantly negative-to-neutral impression of MT quality before taking part in the evaluation of the CoSyne MT system, based on a 5-point Likert scale (average 2.8 / 5) 17 Third Joint EM+ / CNGL Workshop, Luxembourg 14 October 2011

  18. Results: CoSyne MT system quality usefulness Quality and usefulness of the CoSyne MT systemon a 5-point scale (cf. 2.8) (1 = very poor to 5 = very good) • Average quality is medium (3 / 5), better than previous experience (2.8) • Usefulness slightly higher than medium (3.3 /5) 18 Third Joint EM+ / CNGL Workshop, Luxembourg 14 October 2011

  19. Results: CoSyne MT system Is CoSyne MT faster than translating wiki entries into English from scratch?on a 7-point scale (1 = strongly disagree to7 = strongly agree) • Average value higher than mid-point of the scale (4.6 / 7) • In line with e.g. Plitt & Masselot (2010) and Flournoy & Rueppel (2010) • From DE almost twice as good as from NL (due to style of wiki texts?) 19 Third Joint EM+ / CNGL Workshop, Luxembourg 14 October 2011

  20. Results: CoSyne MT system MT quality broken down into: accuracy correctness, comprehensibility readability style on a 7-point scale corr comp read styl accu (1 = poor to 7 = excellent) • We did not explain to users the subtle differences involved • Only accuracy is approx. average (3.6 / 7), other criteria lower • None of the average values particularly poor (DE always better than NL) 20 Third Joint EM+ / CNGL Workshop, Luxembourg 14 October 2011

  21. Results: post-editing CoSyne frequency time effort Amount of work, in terms of time and effort to post-edit the MT output Need to refer to source language while post-editing on a 7-point scale (1 = short/small to 7 = long/large) on a 7-point scale (1 = never to 7 = always) 21 Third Joint EM+ / CNGL Workshop, Luxembourg 14 October 2011

  22. Results: post-editing CoSyne insertion deletion substitution reordering del sub reo ins del sub ins reo Severity of errors overpost-editing operations Frequency of errors overpost-editing operations on a 7-point scale (1 = irrelevant to 7 = very serious) on a 7-point scale (1 = absent to 7 = frequent) 22 Third Joint EM+ / CNGL Workshop, Luxembourg 14 October 2011

  23. Results: final comments • Positive aspects: • good to have draft translation to work upon • integration in the wiki environment • potential to speed up the translation task • Weaknesses: • translation quality needs improving, due to • wrong translation of pronouns • verbs frequently dropped • incorrect word order • mistranslated compounds • limited lexical coverage (OOV items is an issue) • Good potential of the CoSyne system based on first prototype 23 Third Joint EM+ / CNGL Workshop, Luxembourg 14 October 2011

  24. Conclusions • User-focused task-oriented questionnaire-based evaluation for MT used in wikis, supported by post-editing • Evaluation of the first Y1 prototype of the CoSyne MT system for DE→EN and NL→EN • Quality of the CoSyne MT system perceived by the users higher than that of previously used MT systems • Post-editing effort is considered high, but users found it less time- consuming than translating from scratch • Translations from German rated better than those from Dutch • contrasts with earlier findings (Toral et al., 2011) • further investigation into this discrepancy (meta-evaluation) 24 Third Joint EM+ / CNGL Workshop, Luxembourg 14 October 2011

  25. Future work • Extend analysis looking into the post-editing logs, considering actual post-editing time (to estimate costs) • Involve more users after pilot stage • Include a control group (translating manually or other MT s/w) • Investigate correlation between the post-editing carried out by the users and the results provided by TER and TERp (ins, del…) • Use our linguistically-aware diagnostic evaluation tool (DELiC4MT) to monitor performance of the MT system on specific issues flagged up by the users 25 Third Joint EM+ / CNGL Workshop, Luxembourg 14 October 2011

  26. Thank you for your attention! Questions? User-focused task-oriented MT evaluationfor wikis: a case study Federico Gaspari, Antonio Toral, and Sudip Kumar Naskar School of Computing Dublin City University Dublin 9, Ireland {fgaspari, atoral, snaskar}@computing.dcu.ie Third Joint EM+ / CNGL Workshop, Luxembourg 14 October 2011

More Related