430 likes | 629 Views
Information visualization and its applications to machine translation. Rebecca Hwa hwa@cs.pitt.edu. Information visualization ( infovis ). “Information visualization is the use of computer-supported interactive visual representations of abstract data to amplify cognition .” [Card,1999].
E N D
Information visualization and its applications to machine translation Rebecca Hwa hwa@cs.pitt.edu
Information visualization (infovis) “Information visualization is the use of computer-supported interactive visual representations of abstract data to amplify cognition.” [Card,1999]
Functions of visualization [Collins & Carpendale] • Recording information • Tables, maps, blueprints • Process and presenting information • Share, collaborate, revise • Through feedback and interaction • Seeing the unseen
Why might Infovis help improving NLP? “Complexity brings externalization” [Collins, 2008] Slide: Courtesy of Collins’s tutorial
How might infovis help to improve NLP? • Identify activities that are natural for users and machines to collaborate. • Design applications that encourage interactivity. • Use user interactions as diagnostic information.
Outline • Introduction • Infoviz • Machine Translation • Infovis for MT correction • Improving MT • Future Directions
Machine Translation (MT) • Transform a sentence from one language (source) to another (target) while preserving the (literal) meaning • Many approaches [cf. survey by Lopez, 2007] • Example-based MT • Statistical phrase-based MT • (Statistical) Syntax-driven MT • …
Sample output [from Chinese-English Google-MT] “He is being discovered almost hit an arm in the pile of books on the desktop, just like frightened horse as a Lieju Wangbangbian almost Pengfan the piano stool.”
Dealing with translation errors • We have a better language model. • “just like a frightened horse, he …” • We have common sense to help with decoding. • “Cannot find an arm in a pile of books” • He discovered an arm under the books • It was his arm that hit the books • Unknown translations complicates the matter. • “as LiejuWangbangbianalmostPengfanthe piano stool”
Human Computer Collaboration Compensate each other’s weaknesses • Monolingual speaker: • Don’t understand the source language • May not know much about MT/NLP • Can be overwhelmed by too much information • MT: simple language model; no common sense • Collaboration via a graphical interface: • Visualization of multiple NLP resources • Interactive exploration of MT outputs
Research Questions • To what extent can a collaborative approach help monolingual users understand source? • What resources do the users find informative? • How is the user’s understanding impacted by: • User’s background • Genre of the source text • Quality of the MT output • How will these answers help improving MT?
Outline • Introduction • Infoviz • Machine Translation • Infovis for MT correction • The Chinese Room [with J. Albrecht and G.E. Marai] • System overview • A small user study • Discussions • Improving MT • Future Directions
Interface Design Considerations • Can monolingual users be good “translators”? • Provide resources that MT may use • Too much information may confuse users • Limit to a handful of resources • NLP tools are imperfect • Visualization should not hide conflicting analyses between different resources
System Overview • Google MT API • Word alignments • N-best phrasal re- translation • Syntactic parser • Stanford parser [Klein&Maning03] • Bilingual dictionary • Example phrase search • IR over a large source corpus and a parallel corpus [Lemur]
Experimental Methodology • 8 non-Chinese speakers • 4 short passages of ~10 sentences each • Two news articles • Two passages from a novel • Latin Square • Each person corrects the four passages under alternating conditions: • Full interactive system • Document view only • Corrections are judged by 2 bilingual speakers
Translation Correction Procedure • First session: 20 minutes of system tutorial • Four sessions: one passage per sitting • Can work on sentences in arbitrary order • Can “pass” on any part of a sentence • Final commentary • Summarize the four articles • Qualitative feedback on the experience • Suggestions for improving the prototype
Translation Evaluation Procedure • Given: • Source sentence • Reference translation • Bilingual judges evaluate: • Original MT • 8 corrected translations • An alternative reference translation • Emphasis on adequacy • The judges’ scores are normalized to reduce variance [Blatz et al., 2003]
Experimental Hypotheses • The visualization interface will help users recover from more translation errors • The quality of correction is positively correlated with the quality of the initial MT • Users exposed to NLP or other foreign languages may better exploit the interface • Users may develop different correction strategies, preferring different resources
Quality of Translation Correction Italics indicate numbers that are not statistically significant Overall, users made better corrections using the full visualization interface, but users still improved translations by directly editing MT outputs
Quality of Translation Correction Italics indicate numbers that are not statistically significant The relationship between the quality of the original MT and that of the correction is more nuanced than simple correlation.
Time Spent Average seconds per sentence
Impact of User Backgrounds • Prior exposure to NLP • Not significantly better at using the full system • Better at correcting errors directly from MT outputs • Prior domain knowledge • Knowledge about basketball helped user5 and user6 on the sports news article
Discussion – unrecovered errors • Corroborative errors from multiple sources • 美 is interpreted as “U.S.” rather than “beauty” by many NLP applications • Reference: He is responsive to beauty. • MT: He sensitive to the United States. • User corrected: He liked America. • Transliterated foreign names • Reference: Bandai’s spokeswoman Kasumi Nakanishi. • MT:Bandai Group, a spokeswoman for the U.S. to be SIN-West • User corrected: West, a spokeswoman for the U.S. Toy Manufacturing Group, and soon to be Vice President
Discussion • In a group of users, a small percent may succeed in fixing these translation errors Multiple users work together as a group?
Related Work • For monolingual speakers • [Hu, Resnik, and Bedersen, 2009] • For MT researchers • DerivTool[DeNeefe, Knight, and Chan, 2005] • Linear B [Callison-Burch, 2005] • For computer-aided human translators • TransType[Langlais, Foster, and Lapalme, 2000] • TransSearch[Macklovitch, Simard, Langlais, 2000]
Take-away messages • Collaboration between MT and monolingual target-language speakers can lead to an overall improved translation. • Better language modeling and decoding supported by a standard translation model may still improve translation • Domain coherence is an important factor • Syntactic constraints may be helpful
Outline • Introduction • Infovis for MT correction • Improving MT • Language model adaptation for “difficult” phrases [with B. Mohit and F. Liberato] • Prototypes of short phrases • Future Directions
Difficult to Translate Phrases (DTP) • A common strategy for users of TheChineseRoom • Work on small chunks of bad translations in isolation • Apply multiple strategies to make sense of each phrase • Related previous work: Automatically identifying “difficult to translate phrases” (DTPs)[Mohit&Hwa, 2007] • Phrases that MT is likely to get wrong • Missing crucial word/phrasal translation pairs • Complex structures in source phrases • Bad language model/decoder interactions
Research Questions • Should the DTPs be processed differently? • Hypothesis: a general-purpose language model (LM) may not be well suited for DTPs. • Approach: adapt a special LM for each DTP • Will better DTP handling lead to overall translation improvement? • What if a phrase is mis-classified as a DTP?
Adapt Language Models for DTPs • Train one LM for each DTP • Identify a subset of sentences in the (bilingual) training corpus whose source side is similar to the DTP in question. • Use the target side as training data for the LM. • When decoding, use the adapted LM for DTP and the standard LM for the rest of the sentence • Related work: adapt LM for each test set [Kim & Khudanpur, 2004; Tam et. al., 2007; Zhang, 2008; Snover et. al., 2008]
Experimental Setup • Arabic-English Phrase-based MT • Control for translation model sizes: • Smaller TM: Trained on 1M words • Larger TM: Trained on 50M words • LMs under comparison: • Adapted LM • Estimated upper-bound for adapted LM • Baseline LM: English side of the parallel corpus • Larger LM: monolingual English corpus • Evaluation metric: BLEU
When translating DTPs Smaller TM • It’s better to use the adapted LMs than the baseline LM • Using adapted LMs is comparable to using a much larger general purpose LM • Upper bound suggests there’s still room for improvement Larger TM
When translating “easy” phrases • Adapted LM about the same as the baseline • A larger general purpose LM doesn’t help. • The estimated upper bound improvement is smaller
Overall performance • DTP classification has an accuracy of ~75%. • Adapted LM still helps the overall performance, resulting in ~+1 BLEU score.
Outline • Introduction • Infovis for MT correction • Improving MT • Language model adaptation for “difficult” phrases • Prototypes of short phrases [with F. Liberato and B. Mohit] • Future Directions
Dealing with unknown phrases • Users of TheChineseRoom tried to combine individual word lookups in some sensible way • Can we augment the translation phrase table by generating phrasal translations for unknown source phrases during test? • Working on shorter phrases in isolation can use more complex translation and decoding methods
Phrasal prototype • “backed off” version of phrasal translations • e.g. as a mix of surface words and parts-of-speech patterns: NN al JJ↔ NNNN • Can be scored like phrasal translations • Keep only the more likely prototypes • For a source phrase in test that: • matches a source prototype • not in the phrase table • generate target phrase based on target prototype and word-to-word dictionary
Pilot experiment and results • POS prototypes • For training prototypes, we don’t need a large parallel corpus. • Many generated phrases are useless, but they don’t degrade performance
Summary • Infovis for MT – Prototype design has to satisfy two groups • Help users accomplish their objectives • Define interactions that will help researchers • Users identified opportunities to improve MT • We created specialized LM for difficult to translate phrases to approximate domain coherence. • We applied syntactic constraints to generate more potential translations for unknown phrases.
Information visualization and Its application to machine translation Rebecca Hwa hwa@cs.pitt.edu