170 likes | 278 Views
Elicitation. Morphology. Rule Learning. Run-Time System. Rule Refinement. Translation Correction Tool. Word-Aligned Parallel Corpus. Learning Module. INPUT TEXT. Run Time Transfer System. Learning Module. Learned Transfer Rules. Rule Refinement Module. Elicitation Corpus.
E N D
Elicitation Morphology Rule Learning Run-Time System Rule Refinement Translation Correction Tool Word-Aligned Parallel Corpus Learning Module INPUT TEXT Run Time Transfer System Learning Module Learned Transfer Rules Rule Refinement Module Elicitation Corpus Handcrafted rules Morphology Analyzer Decoder Elicitation Tool Lexical Resources OUTPUT TEXT Avenue Architecture
Interactive and Automatic Refinement of translation Rules • Problem: Improve Machine Translation Quality. • Proposed Solution: Put bilingual speakers back into the loop; use their corrections to detect the source of the error and automatically improve the lexicon and the grammar. • Approach: Automate post-editing efforts by feeding them back into the MT system. • Automatic refinement of translation rules that caused an error beyond post-editing. • Goal: Improve MT coverage and overall quality.
Automatically Refine and Expand Translation Rules minimally Manually written Automatically Learned Technical Challenges Automatic Evaluation of Refinement process Elicit minimal MT information from non-expert users
Local vs Long distance Word vs. phrase + Word change Sense Form Selectional restrictions Idiom Missing constraint Extra constraint Interactive elicitation of error information Error Typology for Automatic Rule Refinement (simplified) Missing word Extra word Wrong word order Incorrect word Wrong agreement
TCTool (Demo) Interactive elicitation of error information Actions: • Add a word • Delete a word • Modify a word • Change word order
Automatic Rule Adaptation NP DET N ADJ NP DET N ADJ NP DET ADJ N NP DET ADJ N Types of Refinement Operations 1. Refine a translation rule: R0 R1 (change R0 to make it more specific or more general) R0: una casa bonito a nice house R1: N gender = ADJ gender a nice house una casa bonita
Automatic Rule Adaptation NP DET NADJ NP DET ADJ N NP DET ADJ N NP DET ADJN Types of Refinement Operations 2. Bifurcate a translation rule: R0 R0 (same, general rule) R1 (add a new more specific rule) R0: una casa bonita a nice house R1: ADJ type: pre-nominal un gran artista a great artist
Automatic Rule Adaptation A concrete example Error Information Elicitation error Change word order SL: Gaudí was a great artist MT system output: TL: Gaudí era un artista grande Ucorrection: *Gaudí era un artista grande Gaudí era un gran artista correction clue word Refinement Operation Typology
Automatic Rule Adaptation ADJ::ADJ |: [great] -> [grande] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc)) ADJ::ADJ |: [great] -> [gran] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc)) Finding Triggering Feature(s): (error word, corrected word) = need to postulate a new binary feature: feat1 Blame assignment(from MT system output) tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,8 (DET,0:3 "UN") (N,4:5 "ARTISTA") (ADJ,5:4 "GRANDE") ) ) ) )> S,1 … NP,1 … NP,8 … Grammar
Automatic Rule Adaptation Refining Rules • BifurcateNP,8 NP,8 (R0) + NP,8’ (R1) (flip order of ADJ-N) {NP,8’} NP::NP : [DET ADJ N] -> [DET ADJ N] ( (X1::Y1) (X2::Y2) (X3::Y3) ((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y3 agr)) ; det-noun agreement ((y2agr) = (y3 agr)) ; adj-noun agreement (y2 = x3) ((y2 feat1) =c + ))
Automatic Rule Adaptation Refining Lexical Entries ADJ::ADJ |: [great] -> [grande] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc) ((y0 feat1) = -)) ADJ::ADJ |: [great] -> [gran] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc) ((y0 feat1) = +))
Automatic Rule Adaptation Evaluating Improvement • Given the initial and final Translation Lattices, the Rule Refinement module needs to take into account, whether the following are present: • Corrected Translation Sentence • Original Translation Sentence (labelled as incorrect by the user) un artista gran un gran artista un grande artista *un artista grande
Automatic Rule Adaptation Evaluating Improvement • Given the initial and final Translation Lattices, the Rule Refinement module needs to take into account, whether the following are present: • Corrected Translation Sentence • Original Translation Sentence (labelled as incorrect by the user) *un artista gran un gran artista *un grande artista *un artista grande
Challenges and future work • Credit and Blame assignment from TCTool Log Files and Xfer engine’s trace • Order of corrections matters ~ explore rule interactions • Explore the space between batch mode and fully interactive system • Online TCTool always running to collect corrections from bilingual speakers make it into a game with rewards for the best users
Publications • Font Llitjós, A., J.G. Carbonell and A. Lavie. "A Framework for Interactive and Automatic Refinement of Transfer-based Machine Translation" EAMT 10th Annual Conference 30-31 May 2005, Budapest, Hungary. • Font Llitjós, A., R. Aranovich and L. Levin. "Building Machine translation systems for indigenous languages". Second Conference on the Indigenous Languages of Latin America (CILLA II), 27-29 October 2005, Texas, USA. • Font Llitjós, A., K. Probst and J.G. Carbonell . "Error Analysis of Two Types of Grammar for the Purpose of Automatic Rule Refinement". AMTA, 2004, Washington, USA. • Font Llitjós, A. and J.G. Carbonell . "The Translation Correction Tool: English-Spanish user studies“. LREC, 2004. Lisbon, Portugal.
QuechuaSpanish MT • V-Unit: funded Summer project in Cusco (Peru) June-August 2005 [preparations and data collection started earlier] • Intensive Quechua course in Centro Bartolome de las Casas (CBC) • Worked together with two Quechua native and one non-native speakers on developing infrastructure (correcting elicited translations, segmenting and translating list of most frequent words)
Quechua Spanish prototype MT system Stem Lexicon (semi-automatically generated): 753 lexical entries Suffix lexicon:21 suffixes (150 Cusihuaman) Quechua morphology analyzer 25 translation rules Spanish morphology generation module User-Studies: 10 sentences, 3 users (2 native, 1 non-native)