490 likes | 585 Views
Translation by Collaboration among Monolingual Users. Benjamin B. Bederson www.cs.umd.edu/~bederson @ bederson Computer Science Department Human-Computer Interaction Lab Institute for Advanced Computer Studies iSchool University of Maryland. Social Participant. Computational
E N D
Translation by Collaboration among Monolingual Users Benjamin B. Bederson www.cs.umd.edu/~bederson @bederson Computer Science Department Human-Computer Interaction Lab Institute for Advanced Computer Studies iSchool University of Maryland
Social Participant Computational Participant Programmer User
Human Computation Translation Photo tagging Face recognition Human detection Speech recognition Text analysis Planning ThingsCOMPUTERScan do ThingsHUMANScan do
Human Computation Taxonomy HumanComputation Crowdsourcing Social Computing Data Mining Collective Intelligence
Languages on Internet by Population Source: Global Reach, Internet World Stats
International Children’s Digital Library www.childrenslibrary.org
A real-world problem: ICDL Now: • ~5,000 books • 55 languages • Some translations in a few languages • 3,000 volunteer translators • 100K unique visitors/month Goal: • 10,000 books • 100 languages • Every book in every language! www.childrenslibrary.org
Machine Translation (MT) • Large volume, cheap, fast • Unreliable quality
Professional Translators • High quality, but slow and expensive • (even for common language pairs)
Translation with the Crowd Translate with the MonolingualCrowd • vs. 1,200,000 contributors • Wikipedia: 900 translators
Machine Translation Speed / Affordability Monolingual Human Participation Amateur Bilingual Human Participation Professional Bilingual Human Participation Quality
Source Language Target Language MT Translation Candidate Original Sentence Crowd Tasks: 1 1 Vote Vote MT and word alignment 2 3 Paraphrase source sentence Identify translation errors Crowd Tasks: 2 3 Explain errors Create new translation candidates MT and word alignment New candidate Explanation 1 … 2 3 repeat …
MT MT
MT MT MT enrichment
MT MT MT enrichment MT
MT MT MT enrichment MT
60 Spanish / 22 German speakers • ICDL volunteers • Worked on • 4 Spanish books => German • 1 German book => Spanish Experiment 1 TranslateTheWorld.org
2 German-Spanish bilingual evaluators • Fluency and adequacy: 5-point score • Compared Google Translate and MonoTrans2 Evaluation
Punchline Sentences for which both bilingual evaluators agree score = 5 (N=162 sentences worked on in the experiment) Straight MT: 10% of sentences ready for prime time MonoTrans2: 68% of sentences ready for prime time
Experiment 2 • An alternative use case for crowdsourced translation… • Fanmi mwen nan Kafou, 24 Cote Plage, 41A bezwen manje ak dlo • Moun kwense nan Sakre Kè nan Pòtoprens • Ti ekipman Lopital General genyen yo paka minm fè 24 è • Fanm gen tranche pou fè yon pitit nan Delmas 31 Munro, Robert. 2010. Crowdsourced translation for emergency response and beyond. NSF Workshop on crowdsourcing and translation, University of Maryland.
Experiment 2 • An alternative use case for crowdsourced translation… • My family in Carrefour, 24 Cote Plage, 41A needs food and water • People trapped in Sacred Heart Church, PauP • General Hospital has less than 24 hrs. supplies • Undergoing children delivery Delmas 31 Munro, Robert. 2010. Crowdsourced translation for emergency response and beyond. NSF Workshop on crowdsourcing and translation, University of Maryland.
Punchline Sentences for which both bilingual evaluators agree score = 5 (N=76 sentences completed) Straight MT: 0% of sentences preserve all the meaning MonoTrans2: 38% of sentences preserve all the meaning
Live for one week: • 137,000 page views • 1,900 task submissions • 19 secs per task Example
Toward a more general architecture Joining forces with Chris Callison-Burch, Johns Hopkins University
Take-aways • By combining • machine translation technology • human-computer interfaces • Crowdsourcing it is possible to achieve accurate translation without bilingual human expertise.
Philip Resnik Professor Linguistics Institute of Advanced Computer Studies Participating Students: Chang Hu CS Ph.D. student Alex Quinn CS Ph.D. student VladEidelman CS Ph.D. student YakovKronrod Linguistics Ph.D. student Olivia Buzek CS/Linguistics undergrad TranslateTheWorld.org