520 likes | 536 Views
Microsoft Translator William Lewis wilewis@microsoft.com. Kites Symposium October 31, 2013 - Helsinki , Finland. Overview. Introduction to Microsoft Translator, Tools, Products, etc. Extent of Localization - Methods of Applying MT Collaborative MT Assessing Quality
E N D
Microsoft Translator William Lewis wilewis@microsoft.com Kites Symposium October 31, 2013 - Helsinki, Finland
Overview Introduction to Microsoft Translator, Tools, Products, etc. Extent of Localization - Methods of Applying MT Collaborative MT Assessing Quality Application in Knowledge Base Building your own MT Collaboration with Language Communities
Why MT? The purpose The Crude Extent of localization Data Mining & Business Intelligence Globalized NLP Triage for human translation Research Machine Learning Statistical Linguistics Same-language translation The Good Breaking down language barriers Text, Speech, Images & Video Language Preservation • NOT: • Spend less money • Take the job of human translators • Perform miracles
Microsoft Translator – Quick Facts Linguistically informed statistical MT system 41 languages – from any language to any other language Runs in Microsoft Datacenter Simple web service API: SOAP, REST, AJAX, OData, web site widget 2 million characters/month free Available in the Enterprise Agreement, as a monthly subscription For extreme confidentiality situations, available on-premise Highly customizable: Collaborative Translations – Involve community, coworkers and customers Hub: Custom engine training via an easy-to use UI Web Scale Powers translations in Bing, Microsoft Office, Microsoft SharePoint, Internet Explorer, Yammer Powers translations in Facebook, Twitter, eBay, and many other government and enterprise sites
Microsoft Translator at a Glance World-class Statistical Machine Translation Built on over a decade of work at Microsoft Research Big Data Powered Trained with billions of “parallel” sentences (Bing index & licensed) General Purpose System Powers Bing Translator, supports 40+ languages, any-to-any Unprecedented Customization Capability Hub (train before translation) + CTF (edit after translation) Powerful Cloud API Rich, secure API enabling integrations, 99.9% availability
Enabling Translation in Many Products Fully integrated across the stack, Translator extends the value of Microsoft platform and your solutions built on the Microsoft platformfor our customers including consumer facing applications such as Bing Translator, Bing Toolbar, Bing Dictionary, and Windows Phone App. A few of our customers and partners…. +80,000 more.
Powerful Tools and Customization Our machine learning & big-data based translation technology brings the power of instant translations to break down language barriers for users, developers, webmasters, translators and businesses. Robust, industry leading tools such as the HUB and CTF allow for unprecedented customization of the translation experience. Translation Customization Powerful API Widget Hub CTF Override, modify or vote for the translated output to best fit the content. Provide the end-user alternative translations. Import the edits back into Hub for further training. Instant translation and language services in web, desktop and mobile applications. Highly scalable and robust cloud-based, machine-translation service from Microsoft. Supports SOAP, REST, AJAX, OData, and the Translator web site translation widget. Extensibility for development on SharePoint, Office , Windows Phone, and more….. Instant translations of web pages without the need to write any code. Use the AJAX API to roll-your-own widget. Use the integrated “Collaborative Translations” (CTF) functionality to tap into your community. Custom translation portal to build, train, and deploy customized automatic language translation systems. Combine your data with Bing big data to tune the translation output to best fit your content. Free with any level of Translator subscription (including the free tier).
Integrates with your TM tool Top translation tools support Microsoft Translator
Give these a try! (Demo) Bing Translatorfor Windows Phone The Most Innovative Translation App on any Phone Lync Conversation Translator Realtime Multi-lingual Conversations with Lync Translator Widget for Webpages Instant On-demand Translation for any Web site Word Web App (Microsoft Office) Rich Web based Document Translation, now available in SharePoint, Outlook.com & SkyDrive Contextual Thesaurus Utilize the Power of Machine Translation to Translate “English to English”
Price Competitively priced Monthly subscription Free for up to 2 million characters per month Base price: $10 per million characters Discounted for higher volumes Paid by credit card or via Microsoft Enterprise agreement
Post-Editing Goal: Human translation quality Increase human translator’s productivity In practice: 0% to 25% productivity increase Varies by content, style and language Raw publishing Goals: Good enough for the purpose Speed Cost Publish the output of the MT system directly to end user Best with bilingual UI Good results with technical audiences Extent of localization Methods of applying MT
Post-Editing Goal: Human translation quality Increase human translator’s productivity In practice: 0% to 25% productivity increase Varies by content, style and language Raw publishing Goals: Good enough for the purpose Speed Cost Publish the output of the MT system directly to end user Best with bilingual UI Good results with technical audiences Post-Publish Post-Editing “P3” Know what you are human translating, and why Make use of community Domain experts Enthusiasts Employees Professional translators Best of both worlds Fast Better than raw Always current Extent of localization Methods of applying MT
Always there • Always current • Always retaining human translations • Always ready to take feedback and corrections • ---------- • Midori Tatsumi, Takako Aikawa, Kentaro Yamamoto, and Hitoshi IsaharaProceedings of Association for Machine Translation in the Americas (AMTA) • November 2012
Collaboration: MT + Your community • Collaborative TM entries: • Rating 1 to 4: unapproved • Rating 5 to10: Approved • Rating -10 to -1: Rejected • 1 to many is possible What makes this possible – fully integrated 100% matching TM
Making it easier for the approver – Managing authorized users
What is Important? In this order Quality Access Coverage
Measuring Quality: Human Evaluations Knowledge powered by people Absolute 3 to 5 independent human evaluators are asked to rank translation quality for 200 sentences on a scale of 1 to 4 Comparing to human translated sentence No source language knowledge required Also: Relative evals, against a competitor, or a previous version of ourselves
Measuring Quality: BLEU* Cheap and effective – but be aware of the limits A fully automated MT evaluation metric Modified N-gram precision, comparing a test sentence to reference sentences Standard in the MT community Immediate, simple to administer Correlates with human judgments Automatic and cheap: runs daily and for every change Not suitable for cross-engine or cross-language evaluations * BLEU: BiLingual Evaluation Understudy Result are always relative to the test set.
Measuring Quality In Context Real-world data Instrumentation to observe user’s behavior A/B testing Polling In-Context gives you the most useful results
Knowledge Base Resolve Rate Human Translation Machine Translation Microsoft is using a customized version of Microsoft Translator Source: Martine Smets, Microsoft Customer Support
Collaboration: MT + Your community Remember the collaborative TM? There is more.
Collaboration: You, your community, and Microsoft You, your community and Microsoft working together to create the optimal MT system for your terminology and style
Community-driven MT • Multiple community models • Necessity: driven by crisis • Love of language: driven by strong language/cultural identification • Preservation: desire to preserve language • Haitian Creole • White Hmong
Haitian Creole • One of two official languages in Haiti • A creole that evolved from French, Spanish, and several African languages (large % French-like) • Spoken natively by most of Haiti’s 8M people • Recent as a written language (first literature dates to late 18th century), growing literature base • Semi-literate population, with preference to French (until recently) • Somewhat inconsistent orthography • Limited (but growing) Web presence
Tranblemantè nan Pòtoprens, kapitalAyiti! • The earthquake of January 12th, 2010 a significant humanitarian crisis. • Aid agencies, foreign governments, a variety of NGOs, all responded en masse Pòtoprenste catastrophically afekte 12 janvye 2010 tranblemantèa. • Need for translated materials critical, especially those related to medicine and the relief effort. • Mission 4636 text messages from the field (up to 5K/day at peak) require rapid translation Mounapfouyepamidebri yon bildingkikraze nan tranblemann' tè 12 Janvyea.
The E-mail • At 10:30 a.m. on Tuesday, January 19th2010, our team received an e-mail from a Microsoft employee in the field: • Do we have a translator for Haitian Creole? • If not, could we make one? • A little soul searching: • No one on our team knew anything about Creole • No native speakers • No linguistic background on the language • No idea about grammatical structure • No idea about encoding or orthography • No knowledge about registers or the degree of literacy • No parallel or monolingual training data of any kind (nor readily available documents we could start with) • In effect, we were starting at Zero • So what else could we do but say “YES!”
Mission 4636 • Emergency SMS infrastructure • Setup immediately in wake of Jan. 12, 2010 quake • Mission 4636: • Received SMSs • Translated • Categorized • Triaged • Routed to aid agencies
Mission 4636 Messages • Fanmimwen nan Kafou, 24 Cote Plage, 41A bezwenmanjeakdlo • Mounkwense nan SakreKè nan Pòtoprens • Ti ekipmanLopital General genyenyopakaminmfè 24 è • Fanm gen tranche poufè yon pitit nan Delmas 31 • My family in Carrefour, 24 Cote Plage, 41A needs food and water • People trapped in Sacred Heart Church, PauP • General Hospital has less than 24 hrs. supplies • Undergoing children delivery Delmas 31 Over 80,000 messages received, up to 5,000+/day
Crisis Infrastructure: Message Pipeline Crowd (Translate) Tweets Message Portal Geolocate Triage SMS Media MT Lewis et al, 2011
White Hmong • White Hmong: not a crisis scenario like Creole • But, a language in crisis • Some background: • The Hmong Languages • The Hmong Diaspora • Decline of White Hmong and its usage in younger Hmong
Community Engagement • Involves two critical groups: • Community of native speakers • Community leader(s) • Wide spectrum of users across the Hmong community: • College students • High school students • School teachers • School administrators, deans, professors • Business professionals • Elders
Building MT: Community Contributions • Locating and vetting data • Locate data • Review documents that contain Hmong data • Review parallelism of Hmong-English documents • Actively correcting errors from the engine • Contributing translation “repairs” on web sites that translate to Hmong
Tools Available for Haitian Creole and Hmong • Home page (Web page viewer, cut-and-paste translator) • Haitian Creole and Hmong are among the languages available through our API (Advanced Programming Interface) • Multiple interfaces: AJAX, SOAP, HTTP • Can integrate translation directly into a variety of apps • Widget • Integrate translation into Web pages • Traffic kept client side
Tools Available for Haitian Creole and Hmong • Widget/Collaborative Translation Framework (CTF) • Community can contribute translations • These can be published to Web pages • Mixes MT with “trusted” human translations
Just visit http://hub.microsofttranslator.com to do it yourself