470 likes | 612 Views
Monotrans: Human-Computer Collaborative Translation. Crowdsourcing Translation with People Who Speak Only One Language. Chang Hu, Ben Bederson , Philip Resnik Human-Computer Interaction Lab Computational Linguistics and Information Processing Lab University of Maryland.
E N D
Monotrans: Human-Computer Collaborative Translation Crowdsourcing Translation with People Who Speak Only One Language Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and Information Processing Lab University of Maryland
Why translation by monolingual people? How Monotrans works Research prototype Preliminary evaluation Outline
Languages on Internet by Population Source: Global Reach, Internet World Stats
Languages on Internet by Population Source: Global Reach, Internet World Stats
Languages on Internet by Population Source: Global Reach, Internet World Stats
A real-world problem: International Children’s Digital Library www.childrenslibrary.org
Machine Translation (MT) (餐厅= restaurant, dining hall) • Large volume, cheap, fast • Unreliable quality
Professional Translators • High quality, but slow and expensive • (even for common language pairs)
Translation with the Crowd • Bottle neck: bilingual people
Translation with the Crowd Translation with the Monolingual Crowd • vs. 75,000 contributors • Wikipedia: 800 translators
Machine Translation Monolingual Human Participation Affordability Amateur Bilingual Human Participation Professional Bilingual Human Participation Quality
Why translation by monolingual people? How Monotrans works Research prototype Preliminary evaluation Outline
Basic Idea Source language speaker Target language speaker Inaccurate translation Original source sentence MT Inaccurate back translation Fluent translation MT Fluent, accurate source sentence MT Et cetera…
MT MT
MT MT
MT MT MT enrichment Nous entendons En général In general Get along
MT MT MT enrichment
MT MT MT enrichment
MT MT MT enrichment MT
MT MT MT enrichment MT
Why translation by monolingual people? How Monotrans works Research prototype Preliminary evaluation Outline
Web link Image Mark OK Mark unclear
Why translation by monolingual people? How Monotrans works Research prototype Preliminary evaluation Outline
Preliminary Evaluation • Older version of the UI (same protocol) • Children’s book, Russian to Chinese • 2 Russian speakers and 4 Chinese speakers formed 4 Pairs* • 1 hour per pair
Results • 44 sentences (6 pages) worked on • 28 sentences finished(≈ 4 pages) • Overall translation speed: 50 words per hour • professional translator speed: 250 words per hour
Where to from here? • Larger and more formal validation of the protocol • Richer annotations ✓Images ✓Web links ✓Marking correct spans ✓Marking incorrect spans Paraphrase Word clouds …?? • Large-scale crowd support • (CrowdFlow talk @1:20PM)
Monolingual translation can help large-scale translation Translation with monolingual people is actually feasible Take-Away Message
Q&A Thank You
Project information from one language to another using word alignments as a bridge Illustration of how this has been done for natural language annotation Projected annotation [Kolak 2005]
Projected annotation Everybody has heard the business by Cinderella Everybody has heard the business by Cinderella Everybody has heard the business by Cinderella Tout le monde doit entendre l'histoire de Cendrillon Tout le monde doit entendre l'histoire de Cendrillon Tout le monde doit entendre l'histoire de Cendrillon MT MT MT => Pilot experiment results: Projected annotations helped improve translation Everybody has heard the story about Cinderella
One of my examples involves rmvngllthvwlsfrmthwrdsndshwngthtthrdrcnstllndrstndthsntnc.
Three Types of Errors Tout le monde doit entendre l'histoire de Cendrillon. MT Everybody has hear story about Cinderella Everybody has heard the story about Cinderella I. Detectable and Correctable Error Pilot experiment results: Post-editing machine translation output by monolingual people improves translation quality
Three Types of Errors Tout le monde doit entendre l'histoire de Cendrillon. MT MT Everybody has hear story about Cinderella Everybody has heard the business by Cinderella Communication needed Everybody has heard the story about Cinderella II. Detectable but not Correctable Error
Three Types of Errors Tout le monde doit entendre l'histoire de Cendrillon. MT MT Everybody has hear story about Cinderella Everybody has heard the business by Cinderella Everybody has heard the story about Cinderella II. Detectable but not Correctable Error Pilot experiment results: Communication through enrichment channel can improve translation
Three Types of Errors Tout le monde doit entendre l'histoire de Cendrillon. MT MT MT Everybody has hear story about Cinderella Everybody has heard the business by Cinderella Everybody loves the story about Cinderella Need more redundancy Everybody has heard the story about Cinderella III. Undetectable Error Add more redundancy, reduce it to type I or type II
Prototype Evaluation (1=unintelligible, 4=very intelligible) (1=not translated, 5=full meaning) System seems promising