230 likes | 237 Views
Explore the experiences and strategies behind managing the IBM brand in today’s new era of social translation services. Discover how to optimize language assets, leverage machine translation, and make smart decisions using MT business analytics.
E N D
Experiences in Managing IBM Brand… in today’s new era of Social Translation Services Frank X. Rojas, Christophe Chenon, Jacques Levy, Helena Chapman, Saroj K. Vohra June 2012
Experiences in Managing IBM Brand … in today’s new era of Social Translation Services • Introduction • Standalone MT, Human and MT optimized Translation Services • Optimizing Language Assets to Maximize Quality Production • Using MT Business Analytics to make Smart Decisions • Conclusion / Recommendations
IBM World Wide Translation Operations Marketing Material Machine Translation Legal/Safety/ Contracts Multimedia Overall End to End Product Integrated Publications Process Information Management Francization Centralized DTP Cultural Consultancy Web • 24 Centers World Wide • ~115 Translation Suppliers • Process ~2.8 B Words • Translate ~0.4 B Words • ~60 language pairs One Stop Shop for all Translation Services
RTTS introduced ‘06speech + text IBM Research 2012 MT Scaling G1, G2 and G3 Languages Partners: nFluent, vendor MT MT 40.0 M words target Historical Perspective 2015 2011 MT Training Pilot: GER, BPR, JPN, CHS New MT payment factors 18.0 M words 2013 MT Optimization Pilots …… Deployment 2010 MT piloting Pilot: SPA, ITA, FRE, GER Integrated TM+MT process Partnership: WWTO + n.Fluent 8.6 M words 2012 2012 MT Optimization Pilots Quality filters TrySciene Other partners… 2011 Initial n.Fluent/WWTO Spanish MT pilot Improve efficiency of translators Start MT Deployment Crowd Source Pilot 2010 n.Fluent + WWTO translation memories Vendor MT Pilot RTTS licensed to IBM partners 2009 Pilot SMT w/ Bus. Analytics eSupport“Translate This Page”switch to n.Fluent Have it your way Cost-Quality balance 2008 2007 Rule Based MT - MT portal- Generic crowdsourcing - Text translation services June 2008 2006 eSupport (www)“Translate This Page”JPN – WPS MT Rules Model
0.50 0.45 0.40 0.35 Quality (BLEU) 0.30 0.25 0.20 0.15 0.10 0.05 Base 29k 180k 350k 0 Words Real-Time Translation Server (RTTS) & n.Fluent • Real Time Translation Server (RTTS) • IBMs MT Engine • RTTS provides machine translation for n.Fluent&other applications • APIs allow other applications to access these translation services. • Customization tools – Domains, chat-specific models, … • Commercially licensed to IBM partners • Language Pairs to/from English: • n.Fluent • IBMs MT translation application • Providing machine translation services for: • Text, web pages, and documents (Word, Excel, …) • Instant Messaging chats (via IM plug-in) • Mobile translation application (BlackBerry and others) • Enabled with LEARNING via crowdsourcing (internal 450K IBMers) • Deployed for eSupport self serving tech support (external) Français العربية Deutsch 日本語 한국어 中文 Русский Español Italiano Português
New Era of Social Translation Services Continuous transformation of language services Meet new language needs demanded by growth markets Scale language services to the ever growing variety of untranslated enterprise content Offer the right service models that balance cost and quality/time Protect IBM Brand (assets) while delivering flexible services in today's new social world. Multi-pronged strategy is required to address large volume translations at lower costs with appropriate quality… Flexible &Efficient Language Services Social Translation Translation Asset Optimization Translation Automation …it’s not just crowd sourcing
WW Translation Operations for Smarter Planet New Era of Translation Process / Operational Efficiency Human Linguistic Skills Translation components of the new era: Integration of Linguistic skills across the entire smarter planet Human:Machine Balancing Machine Translation Social Crowd Sourcing Translation Automation Language Assets
Enterprise Content Portfolio IBM Enterprise Translation Services Different kinds of content fit into different translation services Goal: Optimal quality/cost balance Human Services Traditional IBM Standard Quality TM + MT + 100% post-edit MT Optimization Service MT + TM + x% post-edit Right translation service for the right cost/quality Quality, Time Stand-alone MT 0% post-edit Unacceptable Quality Cost
IBM Confidential Have it your way.... flexible WTO Translation Services Translation KPIs - Cost, Quality and Time drive a model selection Stand-alone MT Human Translation MT-Optimized • Multi-facet integration of machine/human/social translation • Leverages translation assets targeted to the domains of the customer • Wide space for opportunity • Reliable and sustainable model • Has its strength and weaknesses • Depends on the quality of bilingual Corpuses/Rules • Good for entry-point, internal consumption, real-time and non-critical content • Traditional human linguists • Established and proven quality system • Flexible, yet human intensive • Brand critical content KPIs • Inconsistent, varies, minimum • Lowest • Quickest • Highest IBM Standard • Competitive, highest • Reliable • Right Quality, controllable • Right Cost • Reliable Quality Cost Time
new/ changed X% exact matches 3 2 4 1 5 Domain MT Models Work Flow Drives Translation Quality New/Changed Content Previous Memory (Pre-certified Quality) GlobalMemorySearch Domain Learning Memories( Qualified TM Assets) High QualityTM matches • Start with supply of rich IBM generic MT model – many domains | single language • Create and train a domain MT model – apply IBM's rich set of domain TM assets per project • Dynamically tune with the richest set of domain TM assets – global memory search • Produce High Quality MT matches – per shipment | leverage the latest context • Feed Quality TM & MT Business Analytics – high quality corpus | certified IBM quality | MT BI High Quality MT matches Domain MT Model - Real-time training - Custom filters - Bus. Analysis BRAND Human Post Edit (IBM Brand Quality) IBM Generic MT Model (per language) MT Business Analytics 1 2 3 4 5
Quality and Terminology in MT 2 • Domain-basedTerminology • MT system trained on • Domain translation memories • Domain-specficterminologydictionaries (highpriority) • Final QualityAssessement • Human • Regular quality control • Real time assessmentof MT output • Automatedqualityassessment – prepost edit • Edit distance: MT segment TM post-editsegments • Assessement of translatedterminology • Automatedcomparisonwithdomain-specificterminologydictionary • Edit distance: MT segment Final QualifiedPost Edittranslation 5 Future 3 4 5
Key Guidelines for Translation Automation, Asset Optimzation Translation Asset Optimization Global memory search MT Optimization Pilots – learning new methods/skills End to End integration • Operational Excellence • Optimize Collaboration • Human element → Rich Supplier base MT business analytics • Competitive payment factors Enterprise Domain Management Translation Automation MT agnostic • Scale MT Services: enterprise multi-supplier operations • Internal – nFluent • External – vendor MT Services Piloting MT integration in new environments 1-4 5
CS 1 Internal Crowd w/ Generic MT Work Flow Internal Crowds 100% new/ changed Content • 2 Pilots in 2011 via “Internal crowds” • MODEL: Bringing Content to an “local crowd” for single purpose of saving translation cost can not be sustained as a reliable service. • “Vibrant content” raises demand for external consumer crowd sourcing. • “Legacy content” will struggle to sustain and deliver reliable translation services • Crowd sourcing remains a available tool for certain content. • Yet, low/no definition of how measure efficiency of crowd services. Production / Delivery IBM Generic MT Model (per language) BRAND QualityControl
Key Guidelines for Social Translation Translation Asset Optimization Global memory search MT Optimization Pilots – learning new methods/skills Enterprise Domain Management Translation Automation MT Agnostic • Scale MT Services: enterprise multi-supplier operations • Internal – nFluent • External – vendor MT Services Piloting MT integration in new environments Social Translation Determine appropriate content and channels Provide guidance to setup execution models Support mechanisms for quality, terminology management
CS 5 Domain MT Models + Crowd Sourcing ExternalConsumer 100% new/ changed SocialEnvironment • Vision of MT Optimization Service applied to crowd sourcing. • Domain MT models • Continuous real-time training/tuning – leveraging batch memory assets. • TM Assets drive production quality via Domain MT model. Content Domain MT Model Validation BRAND MT Business Analytics Production / Delivery QualityControl
MT Business Analysis: Measure the amount of corrections being done 2012Q1 Summary ALL: 3.9 M events / 35.0 M words MT: 0.7 M events / 6.8 M words Best Case: TM – Exact English (Exact English found in previous TM's) 28% of words take 5% of time(sec's) (5.60 words/sec) WORDS MT Best Match 18% of words take 25% of time(sec's) (0.72 words/sec) 55% Improvement Worst Case Base (Manual Translation) (No previous TM found / MT withheld [placebo]) 13% of words take 28% of time(sec's) (0.46 words/sec) TIME Payment Types
MT Business Analytics: MT Leverage Indictor for MT Post-Editing 90 % of MT matches need editing 2012Q1 Summary (All Languages) • TM – Exact English Human leverages 83.3% of matches • TM – Approx. English Human leverages 25.9% of matches • MT – Best Match Human leverages 39.0% of matches B. Portugese: 82.0% Polish: 54.3 % % Match Accepted/Rejected TM – Exact English TM – Approx. English MT – Best Match w/ MT Match Assist
IBM Confidential MT Business Analytics: Summary • MT corpuses involves BIG data • Golden Rule: Quality In = Quality Out • Continuous measurement is critical to running an enterprise translation automated service • MT Business Analytics enable us to balance savings across components. • Human translators, translation vendors, MT service, others. • Translation business is in transformation, MT business analytics enables smart decisions. • Scale MT services across multiple suppliers / languages • Horizon/Vision for MT business analytics – delivering the richest domain language assets • “pin point” feedback and real time measurement of noise within IBM translation assets/models • Terminology genre aligned with MT BI ==> drive corpuses with higher quality • MT Bus Intelligence to help decide which custom filters are needed
Pay for MT Words Translated not MT Matches • We pay for final results (MT payable words) not MT matches • MT matches considered “opinion” until chosen by a human • Too many opinions & opinions by immature MT models are less efficient. • Actual MT payable words have value beyond the specific project • Post Edited words are reused in future and unknown MT context • Engine has to deliver consistent MT payable words • Minimum needed to quality an MT engine for compensation • High MT productivity [rate(MT) / rate(NP)] • High MT leverage [% of MT matches used] • Compensation to be based on MT payment factor
Variance across Languages • There is no single maturity path when modeling MT engines across many languages. • IBM Pilot: each trained MT engine is a unique asset. • Some languages require more modeling/tuning than others. • Language pairs that service “Loose -> Structured” languages are struggling • German requires more effort than Spanish • Are there limitations to statistical MT engines? • New thinking may need to be explored? • Each MT engine will have separate MT payment factors.
Key Lessons / Recommendations MT Post Editing is the driver for quality in Translation Automation. General MT business analytics will become the key to drive rates for each component within the range of language services. Continuous business analytics is needed to optimize machine assets. Enterprises will drive to deliver more advanced “MT Optimization Services” that integrate all sorts of commodity technologies (e.g. leveraging a fleet of machines). Social (Crowd Sourcing) Lessons Stand-alone MT is not ready for reliable delivery of translated content. Brand requires reliable and sustainable quality. Volunteer/Community crowd sourcing not sustainable as “production” of translations. Volunteer/Community crowd sourcing is viable option as ‘quality control’ tool. MT Post Edit Lessons Human translation memories (TM) are the best assets and deliver the highest quality. IBM TM memories are a key asset for MT success. All memory assets need to be protected and managed; avoid introduction of noise. Domain MT models offers significant advantage over generic MT models.
IBM Confidential Merci Terima Kasih Indonesian and Malay Hebrew French Arabic Sinhala Hindi Chinese Traditional script Gracias Obrigado go raibh maith agat Spanish Brazilian Portuguese Gaelic Grazie Russian Thank You Salamat English Italian Tagalog Chinese Simplified script Tamil Korean Danke German Japanese Thai