1 / 49

Translation by Collaboration among Monolingual Users

Translation by Collaboration among Monolingual Users. Benjamin B. Bederson www.cs.umd.edu/~bederson @ bederson Computer Science Department Human-Computer Interaction Lab Institute for Advanced Computer Studies iSchool University of Maryland. Social Participant. Computational

phiala
Download Presentation

Translation by Collaboration among Monolingual Users

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Translation by Collaboration among Monolingual Users Benjamin B. Bederson www.cs.umd.edu/~bederson @bederson Computer Science Department Human-Computer Interaction Lab Institute for Advanced Computer Studies iSchool University of Maryland

  2. Social Participant Computational Participant Programmer User

  3. Human Computation Translation Photo tagging Face recognition Human detection Speech recognition Text analysis Planning ThingsCOMPUTERScan do ThingsHUMANScan do

  4. Human Computation Taxonomy HumanComputation Crowdsourcing Social Computing Data Mining Collective Intelligence

  5. The problem of translation

  6. Languages on Internet by Population Source: Global Reach, Internet World Stats

  7. A real-world problem

  8. International Children’s Digital Library www.childrenslibrary.org

  9. A real-world problem: ICDL Now: • ~5,000 books • 55 languages • Some translations in a few languages • 3,000 volunteer translators • 100K unique visitors/month Goal: • 10,000 books • 100 languages • Every book in every language! www.childrenslibrary.org

  10. The space of solutions

  11. Machine Translation (MT) • Large volume, cheap, fast • Unreliable quality

  12. Professional Translators • High quality, but slow and expensive • (even for common language pairs)

  13. Amateur Translators

  14. Online Labor Markets

  15. The key idea

  16. Translation with the Crowd Translate with the MonolingualCrowd • vs. 1,200,000 contributors • Wikipedia: 900 translators

  17. Machine Translation Speed / Affordability Monolingual Human Participation Amateur Bilingual Human Participation Professional Bilingual Human Participation Quality

  18. Monolingual collaboration

  19. Source Language Target Language MT Translation Candidate Original Sentence Crowd Tasks: 1 1 Vote Vote MT and word alignment 2 3 Paraphrase source sentence Identify translation errors Crowd Tasks: 2 3 Explain errors Create new translation candidates MT and word alignment New candidate Explanation 1 … 2 3 repeat …

  20. MT

  21. MT MT

  22. MT MT MT enrichment

  23. MT MT MT enrichment MT

  24. MT MT MT enrichment MT

  25. Target Side - Vote

  26. Target Side - Identify Errors

  27. Target Side - Edit Translations

  28. Source Side – Explain Errors

  29. Source Side – Vote & Confirm

  30. What we’ve accomplished so far

  31. 60 Spanish / 22 German speakers • ICDL volunteers • Worked on • 4 Spanish books => German • 1 German book => Spanish Experiment 1 TranslateTheWorld.org

  32. 2 German-Spanish bilingual evaluators • Fluency and adequacy: 5-point score • Compared Google Translate and MonoTrans2 Evaluation

  33. Results - Fluency

  34. Results - Fluency

  35. Results - Accuracy

  36. Results - Accuracy

  37. Punchline Sentences for which both bilingual evaluators agree score = 5 (N=162 sentences worked on in the experiment) Straight MT: 10% of sentences ready for prime time MonoTrans2: 68% of sentences ready for prime time

  38. Experiment 2 • An alternative use case for crowdsourced translation… • Fanmi mwen nan Kafou, 24 Cote Plage, 41A bezwen manje ak dlo • Moun kwense nan Sakre Kè nan Pòtoprens • Ti ekipman Lopital General genyen yo paka minm fè 24 è • Fanm gen tranche pou fè yon pitit nan Delmas 31 Munro, Robert. 2010. Crowdsourced translation for emergency response and beyond. NSF Workshop on crowdsourcing and translation, University of Maryland.

  39. Experiment 2 • An alternative use case for crowdsourced translation… • My family in Carrefour, 24 Cote Plage, 41A needs food and water • People trapped in Sacred Heart Church, PauP • General Hospital has less than 24 hrs. supplies • Undergoing children delivery Delmas 31 Munro, Robert. 2010. Crowdsourced translation for emergency response and beyond. NSF Workshop on crowdsourcing and translation, University of Maryland.

  40. TranslateTheWorld.org

  41. Fluency Distribution

  42. Adequacy Distribution

  43. Punchline Sentences for which both bilingual evaluators agree score = 5 (N=76 sentences completed) Straight MT: 0% of sentences preserve all the meaning MonoTrans2: 38% of sentences preserve all the meaning

  44. Scaling Up

  45. Live for one week: • 137,000 page views • 1,900 task submissions • 19 secs per task Example

  46. Copying is the sincerest form of flattery…

  47. Toward a more general architecture Joining forces with Chris Callison-Burch, Johns Hopkins University

  48. Take-aways • By combining • machine translation technology • human-computer interfaces • Crowdsourcing it is possible to achieve accurate translation without bilingual human expertise.

  49. Philip Resnik Professor Linguistics Institute of Advanced Computer Studies Participating Students: Chang Hu CS Ph.D. student Alex Quinn CS Ph.D. student VladEidelman CS Ph.D. student YakovKronrod Linguistics Ph.D. student Olivia Buzek CS/Linguistics undergrad TranslateTheWorld.org

More Related