1 / 13

A Universal approach to Building QA Models

A Universal approach to Building QA Models. Leonid Glazychev, Logrus International Corporation. QA model: general considerations. Reflecting perception and priorities of the target audience Concentrating on factors producing the strongest impression

Download Presentation

A Universal approach to Building QA Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Universal approach to Building QA Models Leonid Glazychev, Logrus International Corporation

  2. QA model: general considerations • Reflecting perception and priorities of the target audience • Concentrating on factors producing the strongest impression • Separating global and local factors/issues • Universal applicability • Covering the whole spectrum of materials • From slightly post-edited MT to ultra-polished manual translations • Same approach for knowledge bases and marketing leaflets • Common approach • Only adjusting acceptance criteria/thresholds based on expectations • Viability • Clear, not overly complicated • Process-oriented, i.e. applicable in the real world • Flexibility • Concentrating on methodology • Particular criteria/issue classification can be taken from elsewhere, for instance: • Based on MQM or other public source • Based on legacy client-sourced criteria…

  3. Real-Life Scenario: No Reference translations available • Two major criteria for any translation are always • Adequacy (Correctly conveys the meaning) and Fluency (Readability) • NEITHER of these depends on translation origin, target audience, brand impact, etc. • No need to delve into technical details or error counts if the text is • Unreadable (incomprehensible) or • Inadequate (inaccurate) • Acceptance thresholds depend on a number of parameters • Goals • Target audience • Speed • Expected longevity and brand impact, etc. • Assessment is relatively quick • Often scanning through the text is sufficient • Especially so when quality is really low  • One needs to be bilingual or have a bilingual expert ready just in case

  4. Making Real-Life LQA as Objective as Possible • NONE of the two major criteria are completely objective • An expert panel would produce a normal opinion curve around the average value • In real life there is no expert panel, but a single evaluator! • The grade assigned by this particular person will NOT be arbitrary, but… • It might fall anywhere within the standard ±2σ range • It depends on the individual’s taste, background, etc. • That is why both criteria can be called SEMI-OBJECTIVE or EXPERT OPINION-BASED • Both criteria NOT too accurate by design! • Consequences • EACH of these two major criteria should be evaluated SEPARATELY • Accurate but incomprehensible texts are as useless as fluent but inadequate ones • Two independent “coordinates”, can’t be combined mechanically • EACH should be evaluated on a threshold-based PASS/FAIL basis • Acceptance range needs to accommodate the whole spectrum of potential expert opinions • Marketing text: Between 8 and 10 (10-point scale) • Knowledge base: Between 5 and 8 (10-point scale) • The minimal scale to be used is a 10-point one, to accommodate the normal curve properly • Smaller scales just do not provide sufficient granularity • Acceptance threshold defined by the area, visibility of materials, time constraints, target audience, etc.

  5. The technical factor • Only content that passes on both accounts is further analyzed for technical imperfections • Terminology inconsistency or deviations • Style guides, country standards • Tags, placeholders • Formatting • Technical issues are OBJECTIVE • Grades expected to be similar irrespective of the reviewer’s personality • A typo is still a typo • An error in country standards is still an error anyway • Issue categories can be based on • MQM or other public source • Legacy client-sourced criteria • Error weights and acceptance thresholds depend on multiple factors • Expectations, target audience, time, brand impact, etc. • Each “quality vector” contains error weights for each categoryand acceptance levels • A limited number of “quality vectors” cover the whole spectrum • The resulting technical (objective) quality grade is the third apex of the quality triangle

  6. The quality triangle (or square) ADEQUACY MAJOR ERRORS TECHNICAL FLUENCY Acceptance Range Filters

  7. Case-study: us acaspanish website review • Organized by GALA (Globalization and Localization Association, www.gala-global.org) • Logrus developed and provided methodology • Logrus organized the review and provided analytics • Volunteer effort, crowdsourcing-based approach • Complicated special rules, strict definitions, lengthy training, etc. out of the question • Contributors chosen among language professionals only • Simplified “quality square” methodology applied • Major errors (10 = None, 0 = More than 2) • Readability (fluency, 0 - 10) • Adequacy (accuracy, 0 - 10) • Technical (0 – 10) • 18 language pros reviewing the website: www.CuidadoDeSalud.gov

  8. Case-study: us acaspanish website review (II) • Major errors: None (11), More than 2 (7), 1 grade ignored • Takeaways • Not too objective! • YOURreviewer could contribute to ANY of the bars • Only threshold-based criteria really work Readability / Intelligibility Mean value: 6.2, Std. Deviation: 2.1 Adequacy / Accuracy Mean value: 6.6, Std. Deviation: 1.9

  9. Case-study: us acaspanish website review (III) • Biggest opinion spread for technical errors • Illustrates the gap between professional and crowdsourced work • No detailed criteria or training applied • Should be the most objective factor  Technical Issues Mean value: 4.8, Std. Deviation: 2.3 • Overall review results still quite reliable/convincing • Not a big surprise given the website initial quality… • “Obamacare’s poorly translated Spanish website frustrates users”, AP, January 12, 2014

  10. Why semi-objective and objective factors should not be combined • Scope and nature • Objective factors are “local”, each applies to a particular small segment (sentence) • Semi-objective factors typically apply to the text as a whole or its large chunks • Semi-objective evaluations imprecise by definition, can’t be used in formulas • Natural variation might affect the summary score dramatically • Importance/weight • Adequacy and fluency issues are way more important than most others • Their relative weight will exceed everything else by orders of magnitude • Combined summary result too dependent on adequacy/fluency • Almost no sensitivity to other factors • Cost, Time, Viability • No reason to waste time on counting/grading technical errors for an incomprehensible or incorrect text

  11. The “quality triangle/SQUARE” approach recipe • Preparation • Select/build the appropriate issue classification for objective errors • Select/set the acceptance thresholds and error weights vector • Define show-stoppers • Process • Apply expert opinion-based (semi-objective) criteria with a PASS/FAIL result • Adequacy (Accuracy) • Fluency (Readability) • Apply objective criteria based on error classification/typology (acceptable docs only) • Language (spelling & grammar) • References, lack of (over-/under-)translations • Country and other standards • Terminology, Style Guide and explicit client’s guidelines • Tags, placeholders, formatting, etc. • Ignore Subjective Complaints • Obtain 3 or 4 resulting ratings for each reasonably translated document • Adequacy (Accuracy) • Fluency (Readability) • Objective (Technical) error rating • [Major problems]

  12. Summary • QA approach equally applicable to almost all real-life translations (without an existing reference) • Works for MT post-editing or even raw MT output • Complements the MQM back-end providing the methodology for quality assurance • The only things that need to be chosen or fine-tuned are • Issue catalogue (for objective issues/errors) • The vector comprising all acceptance thresholds and error weights • Can be chosen from a limited number of preset templates (content profiles) • See concept details in tcworld as of February, 2012, Of power adapters and language quality assurance: http://www.tcworld.info/tcworld/translation-and-localization/article/of-power-adapters-and-language-quality-assurance/

  13. Separate case: REFERENCE translations available • There are plenty of time/money-saving, automated methods to get a ballpark quality evaluation • Applicability area is narrowed dramatically: • Comparing different MTs or Different versions of the same MT • Evaluating test translations • Results might be quick and cheap, but • Not directly related to quality of the translation • Rather illustrating translation’s closeness to the benchmark one • Can be used for developing/improving MTs or quickly evaluating new translators/students • Very limited usability for real-life translation scenarios

More Related