800 likes | 1.02k Views
Towards Interactive and Automatic Refinement of Translation Rules. Ariadna Font Llitjós PhD Thesis Proposal Jaime Carbonell (advisor) Alon Lavie (co-advisor) Lori Levin Bonnie Dorr (Univ. Maryland) 5 November 2004. Outline. Introduction Thesis statement and scope Preliminary Research
E N D
Towards Interactive and Automatic Refinement of Translation Rules Ariadna Font Llitjós PhD Thesis Proposal Jaime Carbonell (advisor) Alon Lavie (co-advisor) Lori Levin Bonnie Dorr (Univ. Maryland) 5 November 2004
Outline • Introduction • Thesis statement and scope • Preliminary Research • Interactive elicitation of error information • A framework for automatic rule adaptation • Proposed Research • Contributions and Thesis Timeline Interactive and Automatic Rule Refinement
Machine Translation (MT) • Source Language (SL) sentence: Gaudi was a great artist Spanish translation: Gaudi era un gran artista • MT System outputs : *Gaudi estaba un artista grande *Gaudi era un artista grande Interactive and Automatic Rule Refinement
Spanish Adjectives Automatic Rule Adaptation NP DET NADJ una casa grande a big house NP DET ADJ N NP DET ADJ N NP DET ADJN un gran artista a great artist Completed Work General order: grande big in size Exception: gran exceptional Interactive and Automatic Rule Refinement
Commercial and Online Systems Correct Translation: Gaudi era un gran artista • Systran, Babelfish (Altavista), WorldLingo, Translated.net : • *Gaudi era gran artista • ImTranslation: *El Gaudi era un gran artista • 1-800-Translate *Gaudi era un fenomenal artista Interactive and Automatic Rule Refinement
Post-editing • Current solutions: Post-editing [Allen, 2003] by human linguists or editors (experts) Automated post-edition module (APE) [Allen & Hogan, 2000] to alleviate the tedious task of correcting most frequent errors over and over • No solution to fully automate post-editing process Interactive and Automatic Rule Refinement
Drawbacks of Current Methods • Manual post-editing Corrections do not generalize Gaudi era un artista grande Juan es un amigo grande (Juan is a great friend) Era una oportunidad grande (It is a great opportunity) • APE Humans need to predict all the errors ahead of time and code for the post-editing rules; given new error Interactive and Automatic Rule Refinement
My Solution • Automate post-editing efforts by feeding them back into the MT system. • Possible alternatives: • Automatic learning of post-editing rules + system independent - several thousands of sentences might need to be corrected for the same error • Automatic refinement of translation rules + attacks the core of the problem • for transfer-based MT systems (need rules to fix!) Interactive and Automatic Rule Refinement
Related Work [Corston-Oliver & Gammon, 2003] [Imamura et al. 2003] [Menezes & Richardson, 2001] [Brill, 1993] [Gavaldà, 2000] [Callison-Burch, 2004] Fixing Machine Translation Rule Adaptation [Su et al. 1995] My Thesis Post-editing No pre-existing training data required No human reference translations required Non-expert user feedback [Allen & Hogan, 2000] Interactive and Automatic Rule Refinement
Resource-poor Scenarios (AVENUE) • Lack of electronic parallel data • Lack of computational linguists Lack of manual grammar Why bother? • Indigenous communities have difficult access to crucial information that directly affects their life (such as land laws, plagues, health warnings, etc.) • Preservation of their language and culture Resource-poor Languages: Mapudungun Quechua Aymara Interactive and Automatic Rule Refinement
How is MT possible for resource-poor languages? Bilingual speakers Interactive and Automatic Rule Refinement
AVENUE Project Overview Elicitation Morphology Rule Learning Run-Time System Word-Aligned Parallel Corpus Learning Module Handcrafted rules Run Time Transfer System Transfer Rules Morpho-logical analyzer Elicitation Corpus Lexical Resources Lattice Elicitation Tool Interactive and Automatic Rule Refinement
My Thesis Elicitation Morphology Rule Learning Run-Time System Rule Refinement Translation Correction Tool Word-Aligned Parallel Corpus Learning Module Handcrafted rules Run Time Transfer System Transfer Rules Morpho-logical analyzer Rule Refinement Module Elicitation Corpus Lexical Resources Lattice Elicitation Tool Interactive and Automatic Rule Refinement
Related Work Fixing Machine Translation Rule Adaptation My Thesis Post-editing Resource-poor languages Interactive and Automatic Rule Refinement
Thesis Statement Given a rule-based Transfer MT system: - Extract useful information from non-expert bilingual speakers to correct MT output. - Automatically refine and expand translation rules, given corrected and aligned translation pairs and some error information, to improve coverage and overall MT quality. Interactive and Automatic Rule Refinement
Assumptions • No parallel training data available • No human reference translations available • The SL sentence needs to be fully parsed by the translation grammar. • Bilingual speakers can give enough information about the MT errors. Interactive and Automatic Rule Refinement
Scope Automatically refine types of errors with: 1. Just user correction information. 2. Correction and error information. 3. A reasonable amount of further user interaction and available correction and errorinformation. Both in manually written and automatically learned grammars [AMTA 2004]. Interactive and Automatic Rule Refinement
Technical Challenges Automatically Refine and Expand Translation Rules minimally Manually written Automatically Learned Automatic Evaluation of Refinement process Elicit minimal MT information from non-expert users Interactive and Automatic Rule Refinement
Preliminary Work Interactive elicitation of error information A framework for automatic rule adaptation
Error Typology for Automatic Rule Refinement (simplified) Local vs Long distance Word vs. phrase + Word change Sense Form Selectional restrictions Idiom Missing constraint Extra constraint Completed Work Interactive elicitation of error information Missing word Extra word Wrong word order Incorrect word Wrong agreement Interactive and Automatic Rule Refinement
TCTool (Demo) Interactive elicitation of error information Actions: Add a word Delete a word Modify a word Change word order Interactive and Automatic Rule Refinement
1st Eng2Spa User Study Completed Work Interactive elicitation of error information [LREC 2004] • MT error classification 9 linguistically-motivated classes [Flanagan, 1994], [White et al. 1994]: word order, sense, agreement error (number, person, gender, tense), form, incorrect wordandno translation Interactive and Automatic Rule Refinement
Translation Rules {NP,8} NP::NP : [DET ADJ N] -> [DET N ADJ] ( (X1::Y1) (X2::Y3) (X3::Y2) ;; English parsing: ((x0 def) = (x1 def)) NP definiteness = DET definiteness (x0 = x3) NP = N (N is the head of the NP) ;; Spanish generation: ((y1 agr) = (y2 agr)) DET agreement = N agreement ((y3 agr) = (y2 agr)) ADJ agreement = N agreement (y2 = x3) ) Pass the features of English N to Spanish N ADJ::ADJ |: [nice] -> [bonito] ((X1::Y1) ((x0 pos) = adj) ((x0 form) = nice) ((y0 agr num) = sg) Spanish ADJ is singular in number ((y0 agr gen) = masc)) Spanish ADJ is masculine in number Interactive and Automatic Rule Refinement
Completed Work Automatic Rule Refinement Framework Automatic Rule Adaptation • Find best RR operations given a: • Grammar (G), • Lexicon (L), • (Set of) Source Language sentence(s) (SL), • (Set of) Target Language sentence(s) (TL), • Its Parse tree (P), and • Minimal correction of TL (TL’) such that TQ2 > TQ1 • Which can also be expressed as: max TQ(TL|TL’,P,SL,RR(G,L)) Interactive and Automatic Rule Refinement
Types of Refinement Operations Automatic Rule Adaptation NP DET N ADJ NP DET N ADJ una casa bonito a nice house NP DET ADJ N NP DET ADJ N N gender = ADJ gender a nice house una casa bonita Completed Work 1. Refine a translation rule: R0 R1 (R0 modified, either made more specific or more general) R0: R1: Interactive and Automatic Rule Refinement
Types of Refinement Operations (2) Automatic Rule Adaptation NP DET NADJ NP DET ADJ N NP DET ADJ N NP DET ADJN Completed Work 2. Bifurcate a translation rule: R0 R0 (same, general rule) R1 (R0 modified, specific rule) R0: una casa bonita a nice house R1: un gran artista a great artist Interactive and Automatic Rule Refinement
Formalizing Error Information Automatic Rule Adaptation Completed Work Wi = error Wi’ = correction Wc = clue word NP DET ADJ N NP DETN ADJ Wi = bonito una casabonito a nice house Wc = casa NP DET ADJ N NP DET N ADJ N gender = ADJ gender Wi’ = bonita a nice house una casa bonita Interactive and Automatic Rule Refinement
Triggering Feature Detection Automatic Rule Adaptation Completed Work Comparison at the feature level to detect triggering feature(s) • Delta function: (Wi,Wi’) Examples: (bonito,bonita) = {gender} (comiamos,comia) = {person,number} (mujer,guitarra) = {} If set is empty, need to postulate a new binary feature gen = masc gen = masc Interactive and Automatic Rule Refinement
Deciding on the Refinement Op Automatic Rule Adaptation Completed Work Given: - Action performed by the user(add, delete, modify, change word order) - Errorinformation available(clue word, word alignments, etc.) Refinement Action Interactive and Automatic Rule Refinement
Modify Add Delete Change W Order +Wc –Wc +Wc –Wc +al –al Wi Wc Wi(…)Wc –Wc = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ Rule Refinement Operations Interactive and Automatic Rule Refinement
Proposed Work - Batch and Interactive mode User Studies Evaluation
Rule Refinement Example Automatic Rule Adaptation Change word order SL: Gaudí was a great artist MT system output: TL: Gaudí era un artista grande Goal (given by user correction): *Gaudí era un artista grande Gaudí era un gran artista Interactive and Automatic Rule Refinement
Automatic Rule Adaptation 1. Error Information Elicitation Refinement Operation Typology Interactive and Automatic Rule Refinement
2. Variable Instantiation from Log File Automatic Rule Adaptation Correcting Actions: 1. Word order change (artista grande grande artista): Wi = grande 2. Edited grande into gran: Wi’ = gran identified artist as clue word Wc = artista In this case, even if user had not identified Wc, refinement process would have been the same Interactive and Automatic Rule Refinement
3. Retrieve Relevant Lexical Entries Automatic Rule Adaptation • No lexical entry for [great gran] • Duplicate lexical entry [great grande] and change TL side: ADJ::ADJ |: [great] -> [gran] ((X1::Y1) (…) ((y0 agr num) = sg) ((y0 agr gen) = masc)) (Morphological analyzer:grande = gran) ADJ::ADJ |: [great] -> [grande] ((X1::Y1) (…) ((y0 agr num) = sg) ((y0 agr gen) = masc)) Interactive and Automatic Rule Refinement
4. Finding Triggering Feature(s) Automatic Rule Adaptation Feature function: (Wi, Wi’) = need to postulate a new binary feature: feat1 5. Blame assignment (from MT system output) tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,8 (DET,0:3 "UN") (N,4:5 "ARTISTA") (ADJ,5:4 "GRANDE") ) ) ) )> Grammar S,1 … NP,1 … NP,8 … Interactive and Automatic Rule Refinement
6. Variable Instantiation in the Rules Automatic Rule Adaptation Wi = grande POSi = ADJ =Y3, y3 Wc = artista POSc = N = Y2, y2 {NP,8} ;; Y1 Y2 Y3 NP::NP : [DET ADJ N] -> [DET N ADJ] ( (X1::Y1) (X2::Y3) (X3::Y2) ((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y2 agr)) ; det-noun agreement ((y3 agr) = (y2 agr)) ; adj-noun agreement (y2 = x3) ) Interactive and Automatic Rule Refinement
7. Refining Rules Automatic Rule Adaptation • BifurcateNP,8 NP,8 (R0) + NP,8’ (R1) (flip order of ADJ-N) {NP,8’} NP::NP : [DET ADJ N] -> [DET ADJ N] ( (X1::Y1) (X2::Y2) (X3::Y3) ((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y3 agr)) ; det-noun agreement ((y2agr) = (y3 agr)) ; adj-noun agreement (y2 = x3) ((y2 feat1) =c + )) Interactive and Automatic Rule Refinement
8. Refining Lexical Entries Automatic Rule Adaptation ADJ::ADJ |: [great] -> [grande] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc) ((y0 feat1) = -)) ADJ::ADJ |: [great] -> [gran] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc) ((y0 feat1) = +)) Interactive and Automatic Rule Refinement
Done? Not yet Automatic Rule Adaptation NP,8 (R0) ADJ(grande) [feat1 = -] NP,8’ (R1) ADJ(gran) [feat1 =c +] [feat1 = +] • Need to restrict application of general rule (R0) to just post-nominal ADJ un artista grande un artista gran un gran artista *un grande artista Interactive and Automatic Rule Refinement
Add Blocking Constraint Automatic Rule Adaptation NP,8 (R0) ADJ(grande) [feat1 = -] [feat1 = -] NP,8’ (R1) ADJ(gran) [feat1 =c +] [feat1 = +] Can we also eliminate incorrect translations automatically? un artista grande *un artista gran un gran artista *un grande artista Interactive and Automatic Rule Refinement
Making the grammar tighter Automatic Rule Adaptation • If Wc = artista • Add [feat1= +] to N(artista) • Add agreement constraint to NP,8 (R0) between N and ADJ ((N feat1) = (ADJ feat1)) *un artista grande *un artista gran un gran artista *un grande artista Interactive and Automatic Rule Refinement
Batch Mode Implementation Automatic Rule Adaptation Proposed Work • Given a set of user corrections, apply refinement module. • For Refinement Operations of errors that can be refined fully automatically: • Just by using correctioninformation 2. Using correction and error information Interactive and Automatic Rule Refinement
Modify Add Delete Change W Order +Wc –Wc +Wc –Wc +al –al Wi Wc Wi(…)Wc –Wc = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ Rule Refinement Operations Interactive and Automatic Rule Refinement
Modify Add Delete Change W Order +Wc –Wc+Wc –Wc +al –al Wi WcWi(…)Wc –Wc = +rule –rule +al–al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ Gaudi was a great artist – Gaudi era un artista grande Gaudi era un gran artista 1. Correction info only Rule Refinement Operations It is a nice house – Es una casa bonito Es una casa bonita Interactive and Automatic Rule Refinement
Modify Add Delete Change W Order +Wc–Wc+Wc–Wc +al –al Wi WcWi(…)Wc –Wc = +rule–rule +al–al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ 2. Correction and Error info Rule Refinement Operations PP PREP NP I am proud of you – Estoy orgullosa tu Estoy orgullosade ti Interactive and Automatic Rule Refinement
Interactive Mode Implementation Automatic Rule Adaptation Proposed Work • Extra error information is required to determine triggering context automatically Need to give other relevant sentences to the user at run-time (minimal pairs) • For Refinement Operations of errors that can be refined fully automatically but: 3. require a reasonable amount of further user interaction and can be solved by available correction and error information. Interactive and Automatic Rule Refinement
Modify Add Delete Change W Order +Wc–Wc+Wc–Wc +al –al Wi WcWi(…)Wc –Wc =+rule –rule +al –al=WiWiWi’ RuleLearner POSi=POSi’ POSiPOSi’ Focus 3 Rule Refinement Operations I see them – Veo los Los veo Interactive and Automatic Rule Refinement
Example Requiring Minimal Pair Automatic Rule Adaptation Proposed Work 1. Run SL sentence through the transfer engine I see them *veo los los veo 2. Wi = los butnoWi’ nor Wc • Need a minimal pair to determine appropriate refinement: I see cars veo autos 3. Triggering feature(s): (los,autos) = {pos} PRON(los)[pos=pron] N(autos)[pos=n] Interactive and Automatic Rule Refinement
Refining and Adding Constraints Proposed Work VP,3: VP NP VPNP VP,3’: VP NP NPVP + [NP pos =c pron] • Percolate triggering features up to the constituent level: NP: PRON PRON + [NP pos = PRON pos] • Block application of general rule (VP,3): VP,3: VP NP VP NP + [NP pos = (*NOT* pron)] Interactive and Automatic Rule Refinement