1 / 24

Evaluation of Free Online Machine Translations

This study evaluates free online machine translations for Croatian-English and English-Croatian language pairs across various domains including city descriptions, law, football, and monitors. The research focuses on user-perceived fluency and adequacy, inter-rater agreement, error analysis, and correlations between different translation tools. Results indicate variations in translation quality based on the domain and language pair.

pvicki
Download Presentation

Evaluation of Free Online Machine Translations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FF Zagreb – Informacijske znanosti Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs SanjaSeljan, sseljan@ffzg.hr University of Zagreb - Faculty of Humanities and Social Sciences, Department of Information Sciences, Croatia MarijaBrkić, mbrkic@uniri.hr University of Rijeka, Department of Informatics, Croatia VlastaKučiš, asta.kucis@siol.net University of Maribor, Department of Translation Studies, Slovenia

  2. Aim • Text evaluation from four domains (city description, law, football, monitors) • Cro-Eng - by four free online translation services (Google Translate, Stars21, InterTran and Translation Guide) • En- Croatian - by Google Translate • Measuring of inter-rater agreement (Fleiss kappa) • influence of error types on the criteria of fluency and adequacy • Pearson’s correlation

  3. Introduction • MT evaluation • Experimentalstudy • Translationtools • Test set description • Evaluation • Erroranalysis • Correlations • Conclusion

  4. I INTRODUCTION • increased use of online in recent years, even among less widely spoken languages • Desirable: moderateto good quality translations • evaluation from the user's perspective • Toolsandevaluationmainly for widelyspokenlanguages • Possible use: gistingtranslations, informationretrieval, i.e. question-answering systems • 1976 Systran - first MT for the Commission of the European Communities + onlinetool+ differentversions • 1997 - first online translationtool- Babel Fish using Systran technology • Important: realisticexpectations

  5. Studies for popular languages • Considerable difference in the quality of translation dependent on the language pair • 2010 - German-French (GT, ProMT, WorldLingo) • 2011- threepopularonlinetools • 2006 - Spanish-English (introductorytextbook) • 2008 – 13 languagesintoEnglish (6 tools: BabelFish, GoogleTranslate, ProMT, SDL free translator, Systran, World Lingo)

  6. MT evaluation – importantinresearch and product design • measure system performance • identify weak points andadjust parameter settings • language independent algorithms (BLEU, NIST) • Bettermetric – closer to human evaluation • need for qualitative evaluation of different linguistic phenomena

  7. II EXPERIMENTAL STUDY • evaluation of free online translation services (FTS)– fromuser’s perspective • undergraduate and graduate students of languages, linguistics and information sciencesattendingcourses on language technologies at the University of Zagreb, Faculty of Humanities and Social Science Test set description • texts 4 domains (city description, law, football, monitors) • Cca 7-9 sentence perdomain(17.8 word/ sent.) • Cro-En, En-Cro

  8. Evaluators • Cro-En: 48 students, finalyearofundergraduateandgraduatelevels • En-Cro: 50 students, nativespeakers • 75% of students attended language technology course(s) Evaluation – before pilot study Average grades for free language resources on the Internet

  9. Croatian tools/resources Tools/ resources in general

  10. Desirable tools/ resources of appropriate quality

  11. Evaluation Manual evaluation • fluency (indicating how much the translation is fluent in the target language) • adequacy(indicating how much of the information is adequately transmitted) • evaluation enriched by translation errorsanalysis • morphological errors, • untranslatedwords • lexical errorsandword omissions • syntactic errors

  12. Tools Cro-Entranslations • Google Translate (GT) - http://translate.google.com • Stars21 (S21) - http://stars21.com/translator • InterTran (IT) - http://transdict.com/translators/intertran.html • Translation Guide (TG) - http://www.translation-guide.com En-Cro translations • obtained from GoogleTranslate

  13. GoogleTranslate • translation service provided by Google Inc. • statistical MT based on huge amount of corpora • Itsupports 57 languages, Croatiansince 2008 S21service • powered by GT • translations not always the same InterTran • powered by NeuroTran and WordTran • sentence-by-sentence and word-by-word TranslationGuide • powered by IT • Differenttranslations

  14. Results - Cro-En • either low grades (TG and IT) or high grades (S21 and GT), in comparison to the average value (3.04) • S21(4.66) : GT (4.62) – city description, legal • GT – football, monitors • Best average result – legal domain, then monitors and football • Lowest – city description (the most free in style)

  15. Results - Cro-En • En-Cro- lower average results than the reverse direction: football (3.75 : 4.84), law, monitors • Higher average grade in city description (shorter sentences, mostly nominative constructions, frequent terms) • Football domain - specific terms, non-nominative constructions

  16. Error analysis En-Cro • Translations offered by GT and S21 are very similar, although not identical • TG and IT – differenceinnumberofuntranslatedwords • TG does not recognize words with diacritics Cro-En • the highest number of lexical errors, including also errors in style (av. 2.44) • Untranslatedwords (1.83), morphological (1.75), syntacticerrors (1.38) • Lowestscore, highest number of errors - footballdomain (mostly lexical errors and untranslated words) • best score – incitydescription domain (lexcialerrors) • Lowest no. errors – legaldomain (evenlydistributed)

  17. Morphologicalerrors– mostlyindomainofmonitors, thesmallest no. incitydesription (dominantvalue 1) • Untranslated words - byfar mostly in the football • translation grades - mostlyinfluenced by untranslated words Dominantvalues • Morphologicalerrors: 1 incitydescriptionandmonitors, 3 inthe legal and football • Lexicalerrors: 1 incitydescription, othershigher • untranslated words - 1 in all domains • syntactic errors - 1 in all domains but football (2-3)

  18. Pearson’s correlation • smaller number of errors augments the average grade • correlation between errors types and the criteria of fluency and adequacy • fluency - more affected by the increase of lexical and syntactic errors, • adequacy is more affected by untranslatedwords

  19. Fleiss' kappa • for assessing the reliability of agreement among raters when giving ratings to the sentences • Indicating extent to which the observed amount of agreement among raters exceeds what would be expected if all the raters made their ratings completely randomly. • Score - between 0 and 1 (perfect agreement) • 0.0-0.20 slight agreement N – total of subjects • 0.21-0.40 fair agreement n – no. of raters per subject • 0.41-0.60 moderate agreement i – extent to which raters • 0.61-0.80 substantial agreement agree on i-subject • 0.81-1.00 almost perfect agreement j - categories

  20. relatively high level of the agreement among raters per domain and per system in Cro-En translations • moderate 0.41-0.60 (for IT translation service), • substantial agreement 0.61-0.80 (S21 and GT) • perfect agreement 0.80-1.00 (TG – the worst tool) • En-Cro translations - inter-rater agreement per domain • lowest level of agreement has been detected in the domains of football and law (from 0.4-0.49 fair & moderate) – larger and more complex sentences • substantial agreement (0.61-0.80) – in city description • level of inter-rater agreement is lower for En-Cro translations in all domains

  21. Conclusion • evaluation study of MT in 4 domains • Cro-En – 4 free online translation services • En-Cro translations – by Google Translate • Evaluator’s profile • high interest in use of translation resources and tools • Critical evaluation • System evaluation • perfect agreement in the ranking of TG as the worst translation service • substantial agreement is achieved for S21 and GT services • moderate agreement is shown for IT, which has performed slightly better than TG.

  22. Cro-En translations • S21 and GT ( 4.63 to 4.84)- football, law and monitors • city description - Cro-En lower than in En-Cro En-Crodirection– by GT • lower grades than in the opposite direction (specificterms, non-nominative constructions, multi-wordunits) • Exceptcity description domain - containingmostly nominative constructions, frequent words, no specific terms Error analysis • translation grades are mostly influenced by untranslated words (especially the criteria of adequacy) • morphological and syntactic errors reflect grades in smaller proportion (fluency) • ,

  23. GoogleTranslate service • used in both translation directions • harvesting data from the Web, seems to be well trained and suitable for the translation of frequent expressions • Doesn’t perform well where language information is needed, e.g. gender agreement, in MW expressions Furtherresearch • Betterquantitavieanalysisperdomain • more detailed analysis of specific language phenomena

  24. FF Zagreb – Informacijske znanosti Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs SanjaSeljan, sseljan@ffzg.hr University of Zagreb - Faculty of Humanities and Social Sciences, Department of Information Sciences, Croatia MarijaBrkić, mbrkic@uniri.hr University of Rijeka, Department of Informatics, Croatia VlastaKučiš, asta.kucis@siol.net University of Maribor, Department of Translation Studies, Slovenia

More Related