1 / 34

Evaluation of Machine Translation Errors in English and Iraqi Arabic

Evaluation of Machine Translation Errors in English and Iraqi Arabic. Sherri Condon, Dan Parvaz, John Aberdeen, Christy Doran, Andrew Freeman and Marwan Awad The MITRE Corporation. LREC 2010. Approved for Public Release:10-101174. Distribution Unlimited. Preview. Methods

sef
Download Presentation

Evaluation of Machine Translation Errors in English and Iraqi Arabic

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluation of Machine Translation Errors in English and Iraqi Arabic Sherri Condon, Dan Parvaz, John Aberdeen, Christy Doran, Andrew Freeman and Marwan Awad The MITRE Corporation LREC 2010 Approved for Public Release:10-101174. Distribution Unlimited

  2. Preview Methods DARPA Speech Translation HTER and annotation process Annotation categories Iraqi Arabic to English (I→E) Errors Polarity errors Pronoun errors Copula errors English to Iraqi Arabic (E→I) Errors Subject pronoun inflection errors Word order errors “Other” errors Summary and Conclusions

  3. DARPA Speech Translation Systems 2-way communication for English and Iraqi Arabic Military domains and use cases Checkpoint, facility inspection, civil affairs, training, medical Funded 4 speech translation systems (labeled A-D) Evaluations conducted by NIST and MITRE Live evaluations with military users and Iraqi speakers Offline evaluations using recordings of military users and Iraqi speakers Error analyses use translations of text transcriptions from offline recordings Exclude errors from speech recognition

  4. Evaluation Data Samples from 2 evaluations June 2008 November 2008 Translations from 4 systems Subset of offline inputs Number of Translations Annotated

  5. Error Analysis THIS IS REALLY HARD! Errors depend on what’s correct But no single correct translation Automated measures of translation quality like Translation Error Rate (TER) are not diagnostic Scores based on changes needed to turn system output into reference translation (insertion, deletion, substitution, shift) Human TER (HTER) requires humans to create reference translations as close as possible to system output We used HTER for error annotation Provides a maximally close correct translation TER alignment and annotation facilitates our annotation

  6. Annotation Process Customize reference translations NIST post-editing tool for HTER reference translations 4 reference translations for post-editors Align and annotate translations with TER Annotators may change alignments Keep word classes aligned where possible Annotate TER errors Identify major word classes of errors Quantify polarity and speech act errors Exclude minor errors

  7. Sample Annotation *substituted speech act (takes priority over “word order” annotation)

  8. Annotations Null [synonyms, articles, some prepositions/inflections] Word Order [= TER ‘shift’] Polarity (negative to positive or positive to negative) Substituted Speech Act (e.g., question to statement) Untranslated (transliterated, “???”) Verb (deleted, inserted, and substituted) Noun (same) Pronoun (same) Pronoun-Verb Complex [for English contractions and Arabic verbs with subject inflection only] (same) Verb Person Inflection [substitute Arabic subject inflection] Other [adjectives, prepositions, conjunctions] (same)

  9. June I→E: Proportions of TER Error Types Systems

  10. June I→E: Proportions of Word Class Errors Systems

  11. I→E: Polarity Errors Transcript: عندي حالياً ثلاثين جندي متدرب و عندهم أسلحة خفيفة MT: I don’thave at the moment thirty soldier trained and they have light weapons Ref: I have at the moment thirty trained soldiers and they have light weapons Transcript: لا و الله ما كان عندنا وقت كنا مستعجلين و محتاجين ناس يشتغلون بالعجل فما سوينا ما تشيكنا عليهم قبل MT: no and god we do not have time we were in a hurry and we need people to work hurry up so we did nothing we checked them before Ref: no we did not have time we were in a hurry and we need people to work immediately so we did not check them before 

  12. June I→E Pronoun Issues: Subjects Frequency of pronouns (19%) and nouns (17%) are nearly equal yet pronoun errors are 2 times higher than noun errors In Iraqi Arabic pronominal subjects are expressed only as verb inflection MT: was bitten by a scorpion Ref: he was bitten by a scorpion But some contrasts are neutralized إفتهمت/iftahamit/ understand+past+1st or 2nd person singular subject “I/you understood” MT: you see his symptoms Ref: I saw his symptoms

  13. I→E Pronoun Issues: Insertions Subject pronouns (few) MT: those people they store them in this complex Ref: those people store them in this complex Resumptive pronouns (frequent) MT: it is about three kilometers from the point the checkpoint that he ran away from it Ref: it is about three kilometers from the point the checkpoint that he ran away from MT: the area is four streets that will probably restrict it Ref: the area is four streets that we will probably surround These are non-null only if they might cause confusion, e.g., garden paths

  14. I→E Pronoun Issues: Gender Iraqi Arabic does not have a neutral gender Many examples with it instead of he or she MT: are taking care of it god willing and hopefully it will get better a little bit more Ref: we are taking care of him god willing and hopefully he will get better soon MT: of course I mean it is in good condition Ref: of course I mean she is in good condition Only one example of he instead of it MT: he civilian house consists of three rooms Ref: it is a civilian house consisting of three rooms

  15. I→E Verbs: English be vs. Arabic “be” English be serves several functions They are eating at the restaurant (progressive) The car was driven by a teenager (passive) Sam is my brother (copula: identity) Julia is brilliant (copula: attribution) Arabic copula is not used in present tense MT: no sir all the family in the house Ref: no sir all the family is in the house MT: but the problem those lazy and sleep on the at night Ref: but the problem is they are lazy and sleep at night Many errors with be are more complex errors

  16. Proportion of be in June I→E Verb Errors Systems

  17. June E→I: Proportions of TER Error Types Systems

  18. June E→I: Proportions of Word Class Errors Systems

  19. E→I: Subject Verb Agreement Inflection With an expressed subject, subject inflection on the verb that does not agree may cause confusion Source: my marines are going to search the house Ref:المارينز مالتي رح يفتشون البيت MT:المارينز مالتي رح أفتش البيت Ref: AlmArynz mAltyrHyft$wnAlbyt Ref: the-Marines my will3m-search-plthe-house MT: AlmArynz mAltyrH>ft$ Albyt MT: the-Marines my will1s-searchthe-house Special annotation for these errors: Verb Person Inflection Relatively high frequency, except in rule-based system

  20. E→I: Pronominal Subject Inflection on Verbs Pronoun errors occur when subject inflection does not match the source subject pronoun Source: I might need to tell my commander I am stopping you Ref: يمكن لازم أقول للمسؤول مالي أوقفك MT: يمكن لازم تقول للمسؤول مالي نوقف Ref: ymkn lAzm >qwlllms&wlmAly>wqfk Ref: maybe must 1st-sg-sayto-the-official my1st-sg-stop-2nd-sg MT: ymkn lAzm tqwlllms&wlmAlynwqfk MT: maybe must 2m/3f-say to-the-officialmy1st-pl-stop-2nd-sg Number errors usually annotated as ‘null’ (green font) Person errors dramatically change meaning (red font)

  21. E→I: Both Subject and Verb are Incorrect With pronominal subject unexpressed, a single verb may incorporate more than one significant error Source: we will record inside who it belongs to Ref:إحنا رح نسجل جوة هو مال منو MT:إحنا رح يقعدجوة منو مال Ref: <HnArHnsjljwp hw mAlmnw Ref: we will we-recordinside he possession whom MT: <HnArHyqEdjwpmnwmAl MT: we will he-sitsinside whom possession Special annotation for Pronoun-Verb Complex Should count as both pronoun and verb error Low frequency

  22. E→I: Word Order Errors: Noun-Adjective Slightly more word order errors in E→I vs. I→E In both directions, a significant proportion of these reverse noun head and modifier order Source: they have additional supplies Ref:عندهم التجهيزاتإضافية MT: عند إضافيالتجهيزات Ref: EndhmAltjhyzAt<DAfyp Ref: at+themdet+suppliesadditional+fem MT: End<DAfyAltjhyzAt MT: with additionaldet+supplies

  23. E→I: Word Order Errors: Noun-Noun This is the Arabic noun-noun modification known as the construct or idafa Source: How does your source know this? Ref: شلون المصدر مالتك عرف بهذا الشيء MT: شلونمالتك مصدرأعرف هذا Ref: $lwnAlmSdrmAltkErfbh*A Al$y’ Ref: how det-sourceposs-2sm 3s-know in-this det-thing MT: $lwnmAltkmSdr>Erf h*A MT: how poss-2smsource 1s-know this .

  24. E→I: Word Order Errors in Idafa 40% of November 2008 E→I word order errors are wrong idafa order Source: How does your source know this? Ref: إجيت علمود أشوف محطة الكهرباء مالتكم MT:إجيت علمود أشوف الكهرباء المحطة مالتكم Ref: <jytElmwd>$wfmHTpAlkhrbA’ mAltkm Ref: came+1s in-order-to see+1s station-ofdet+electricity poss-2p MT: <jytElmwd>$wfAlkhrbA’ AlmHTpmAltkm MT:came+1sin-order-to see+1s det+electricitydet+station poss-2p . .

  25. E→I: “Other” Errors from Phrasal Verbs Phrasal verbs are frequently treated as verbs plus prepositions Source:we have to go through the detaining process Ref: لازم نسوي عملية الحجز MT: لازم نروح عن طريق الحجز العملية Ref: lAzm nswyEmlypAlHjz Ref: must1pl-do +def-process the-detention MT: lAzm nrwHEn TryqAlHjzAlEmlyp MT:must1pl-gofrom road the-detention the-process English source "to go through" roughly means "to do from start to finish" MT translated it as “motion through” or "to take a certain route" This is a type of word sense error

  26. E→I: “Other” Multiword Expression Errors 23% of “Other” errors involve multiword expressions in the November 2008 corpus Source: we can give you funds to where you can go out and buy the materials Ref: نقدر ننطيك الفلوس علمود تطلع وتشتري المواد MT: أقدر الفلوس وينتطلع وتشتري المواد Ref: nqdrnnTykAlflwsElmwdtTlEwt$tryAlmwAd Ref: can+1p1p+give+2ms det+moneyin-order-to 2ms+go-up and+2ms+buy det+material MT: >qdrAlflwswyntTlEwt$tryAlmwAd MT:can+1sdet+moneywhere 2ms+go-up and+2ms+buy det+material

  27. June I→E: Error Type Proportions by Word Class

  28. November I→E: Error Type Proportions by Word Class

  29. I→E: Other Error Proportions

  30. Error Frequencies and BLEU Scores I to E E to I *raw frequency/normalized per input

  31. Conclusions Linguistic differences will always challenge translation systems Some differences are difficult even for high frequency expressions like the copula The need to insert lexemes not present in the source Or to remove lexemes that are present in the source These are characteristics of multiword expressions Discourse context is needed for deictic elements like pronouns Iraqi Arabic speakers know whether the speaker is referring to “I” or “you” from the context Knowing whether to translate Arabic “he” or “she” as “it” requires knowledge of the referent of the pronoun

  32. Future Work Compute relative weight of error types Compare to human judgments collected by NIST Compute regression tests Compare July 2007 with November 2008 translations Additional subcategories of errors

  33. Word Sense Ambiguities June 2008 I -> E averaged .021 E -> I averaged .032 These are low compared to Vilar et al. (2006) After analysis of November E ->1 “Other” errors, annotators were more sensitive to broader class of word sense errors November E ->1 is about 10% Comparable to Vilar et al. (2006) November I -> E word sense analysis is incomplete

  34. Inter-Annotator Reliability English annotation performed by 3 native speakers June 2008 annotated independently November 2008 each annotated twice and differences resolved 3 Arabic annotators 2 non-native speakers and 1 native speaker Half annotated by each non-native speaker All annotations reviewed by native speaker Differences resolved

More Related