770 likes | 941 Views
Towards Interactive and Automatic Refinement of Translation Rules. PhD Thesis Proposal Ariadna Font Llitjós 5 November 2004. Outline. Introduction Related Work Technical Approach Interactive elicitation of error information A framework for automatic rule adaptation Preliminary Research
E N D
Towards Interactive and Automatic Refinement of Translation Rules PhD Thesis Proposal Ariadna Font Llitjós 5 November 2004
Outline • Introduction • Related Work • Technical Approach • Interactive elicitation of error information • A framework for automatic rule adaptation • Preliminary Research • Proposed Research • Contributions and Thesis Timeline Interactive and Automatic Rule Refinement
How to recycle corrections of MT output back into the system by adjusting and adapting the grammar and lexical rules
The Problem General • MT output still requires post-editing. • Current systems do not recycle post-editing efforts back into the system, beyond adding as new training data. Avenue specific • Resource-poor scenarios: lack of manual grammar or very small initial grammar. • Need to validate elicitation corpus and automatically learned translation rules . Interactive and Automatic Rule Refinement
Motivation General • Very costly and time consuming to refine and extend translation rule sets manually by trained computational linguists with knowledge of both languages. Resource-poor scenarios • Indigenous communities have difficult access to crucial information that directly affects their life (such as land laws, plagues, health warnings, etc.). • Preservation of language and culture. Interactive and Automatic Rule Refinement
MT Output SL: Mary and Anna are falling TL: María y Ana están cayendo TL’: María y Ana se están cayendo SL: Gaudi was a great artist TL: Gaudi estaba un artista grande TL: Gaudi era un artista grande TL’: Gaudi era un gran artista SL: You saw the woman TL: Viste la mujer TL’: Viste a la mujer TL: Vió la mujer SL: I used my elbow to push the button TL: Usé mi codo que apretar el botón TL’: Usé mi codo para apretar el botón SL: We are building new bridges in the city TL: Nosotros estamos construyendo nuevo puentes dentro la ciudad TL’: Nosotros estamos construyendo nuevo puentes dentro de la ciudad Interactive and Automatic Rule Refinement
Resource-poor scenarios • No e-data available (often spoken tradition) SMT or EBMT • No computational linguists to write a grammar Interactive and Automatic Rule Refinement
Resource-poor scenarios • No e-data available (often spoken tradition) SMT or EBMT • No computational linguists to write a grammar So how can we even start to think about MT? Interactive and Automatic Rule Refinement
Resource-poor scenarios • No e-data available (often spoken tradition) SMT or EBMT • No computational linguists to write a grammar So how can we even start to think about MT? • That’s what AVENUE is all about Elicitation Corpus + Automatic Rule Learning Interactive and Automatic Rule Refinement
Resource-poor scenarios • No e-data available (often spoken tradition) SMT or EBMT • No computational linguists to write a grammar So how can we even start to think about MT? • That’s what AVENUE is all about Elicitation Corpus + Automatic Rule Learning What do we usually have available in resource-poor scenarios? Interactive and Automatic Rule Refinement
Resource-poor scenarios • No e-data available (often spoken tradition) SMT or EBMT • No computational linguists to write a grammar So how can we even start to think about MT? • That’s what AVENUE is all about Elicitation Corpus + Automatic Rule Learning What do we usually have available in resource-poor scenarios? Bilingual users Interactive and Automatic Rule Refinement
Avenue overview Elicitation Morphology Rule Learning Run-Time System Rule Refinement Word-Aligned Parallel Corpus Translation Correction Tool Learning Module Handcrafted rules Run Time Transfer System Transfer Rules Morpho-logical analyzer Rule Refinement Module Elicitation Tool Elicitation Corpus Lexical Resources Lattice Interactive and Automatic Rule Refinement
Avenue overview: my thesis Elicitation Morphology Rule Learning Run-Time System Rule Refinement Word-Aligned Parallel Corpus Translation Correction Tool Learning Module Handcrafted rules Run Time Transfer System Transfer Rules Morpho-logical analyzer Rule Refinement Module Elicitation Tool Elicitation Corpus Lexical Resources Lattice Interactive and Automatic Rule Refinement
Thesis Statement - Given a rule-based Transfer MT system, we can extract useful information from non-expert bilingual speakers about the corrections required to make MT output acceptable. Interactive and Automatic Rule Refinement
Thesis Statement - Given a rule-based Transfer MT system, we can extract useful information from non-expert bilingual speakers about the corrections required to make MT output acceptable. - We can automatically refine translation rules, given corrected and aligned translation pairs and some error information, so as to improve coverage and overall MT quality. Interactive and Automatic Rule Refinement
Outline • Introduction • Related Work • Technical Approach • Interactive elicitation of error information • A framework for automatic rule adaptation • Preliminary Research • Proposed Research • Contributions and Thesis Timeline Interactive and Automatic Rule Refinement
Related Work • Post-editing to improve MT systems • minimal post-editing [Allen, 2003] • include user feedback in the MT loop [Callison-Burch, 2004], [Allen & Hogan, 2000], [Su et al. 1995], [Menezes & Richardson, 2001] and [Imamura et al. 2003] • MT error information and classification • [Flanagan, 1994], [White et al., 1994], [Allen 2003], [Niessen et al. 2000] Interactive and Automatic Rule Refinement
Related Work++ • Rule Adaptation • POS tagging: [Lin et al., 1994] • parsing: [Lehman, 1989], [Brill, 2003] • NLU: [Gavaldà, 2000] • MT: [Corston-Oliver & Gammon, 2003]: DTs to correct binary features of LF to reduce noise [Yamada, 1995]: structural comparison between machine translations and manual translations to adapt MT system to new domain. [Naruedomkul, 2001]: modify HPSG-like semantic representation of TL until it is acceptably similar to the SL. Interactive and Automatic Rule Refinement
Outline • Introduction • Related Work • Technical Approach • Interactive elicitation of error information • A framework for automatic rule adaptation • Preliminary Research • Proposed Research • Contributions and Thesis Timeline Interactive and Automatic Rule Refinement
Interactive elicitation of MT errors Assumptions: • non-expert bilingual users can reliably detect and minimally correct MT errors, given: • SL sentence (I saw you) • TL sentence (Yo vi tú) • word-to-word alignments (I-yo, saw-vi, you-tú) • (context) • using an online GUI: the Translation Correction Tool (TCTool) Goal: • simplify MT correction task maximally Interactive and Automatic Rule Refinement
MT error typology for RR (simplified) • missing word • extra word • word order (local vs long-distance, word vs phrase, word change) • incorrect word (sense, form, selectional restrictions, idiom, ...) • agreement (missing constraint, extra agreement constraint) Interactive and Automatic Rule Refinement
Outline • Motivation and Goals • Related Work • Technical Approach • Interactive elicitation of error information • A framework for automatic rule adaptation • Work to Date • Proposed Research • Contributions and Open Questions Interactive and Automatic Rule Refinement
Automatic Rule Refinement Framework • Find best RR operations given a: • grammar (G), • lexicon (L), • (set of) source language sentence(s) (SL), • (set of) target language sentence(s) (TL), • its parse tree (P), and • minimal correction of TL (TL’) such that TQ2 > TQ1 • Which can also be expressed as: max TQ(TL|TL’,P,SL,RR(G,L)) Interactive and Automatic Rule Refinement
Types of RR operations • Grammar: • R0 R0 + R1 [=R0’ + contr] Cov[R0] Cov[R0,R1] • R0 R1 [=R0 + constr] Cov[R0] Cov[R1] • R0 R1[=R0 + constr= -] R2[=R0’ + constr=c +] Cov[R0] Cov[R1,R2] • Lexicon • Lex0 Lex0 + Lex1[=Lex0 + constr] • Lex0 Lex1[=Lex0 + constr] • Lex0 Lex0 + Lex1[Lex0 + TLword] • Lex1 (adding lexical item) bifurcate refine Interactive and Automatic Rule Refinement
Formalizing Error Information Wi = error Wi’ = correction Wc = clue word Example: SL: the red car - TL: *el auto roja TL’: el auto rojo Wi = roja Wi’ = rojo Wc = auto need to agree Interactive and Automatic Rule Refinement
Finding Triggering Features Once we have user’s correction (Wi’), we can compare it with Wi at the feature level and find which is the triggering feature. If set is empty, need to postulate a new binary feature Delta function: Interactive and Automatic Rule Refinement
Outline • Introduction • Related Work • Technical Approach • Interactive elicitation of error information • A framework for automatic rule adaptation • Preliminary Research • Proposed Research • Contributions and Thesis Timeline Interactive and Automatic Rule Refinement
TCTool v0.1 Interactive elicitation of error information Actions: Add a word Delete a word Modify a word Change word order Interactive and Automatic Rule Refinement
TCTool v0.1 specs Interactive elicitation of error information • First five translations from lattice produced by transfer engine. • Asks users to pick correct translation, or else, best incorrect translation (i.e. the one requiring the least amount of corrections). • Provides translation correction and error classification help (static tutorial + error example page). • CGI scripts in PERL • Correction interface in JavaScript (Kenneth Sim and Patrick Milholland) Interactive and Automatic Rule Refinement
1st Eng2Spa user study Interactive elicitation of error information [LREC 2004] • Manual grammar: 12 rules + 442 lexical entries • MT error classification (v0.0): 9 linguistically-motivated classes word order, sense, agreement error (number, person, gender, tense), form, incorrect wordandno translation • Test set: 32 sentences from the AVENUE Elicitation Corpus (4 correct / 28 incorrect) Interactive and Automatic Rule Refinement
Data Analysis Interactive elicitation of error information • Interested in high precision, even at the expense of lower recall • Users did not always fix a translation in the same way • Most of the time, when the final translation was not = gold standard, it was still correct or better (better stylistically) Interactive and Automatic Rule Refinement
Rule Refinement Operations Automatic Rule Adaptation • Organized according to type of actions users can perform to correct a sentence with TCTool • And according to what error information is available (Wc, alignments, …) Interactive and Automatic Rule Refinement
Automatic Rule Adaptation Interactive and Automatic Rule Refinement
Rule Refinement Simulation I Automatic Rule Adaptation Change word order 1. Run SL sentence through the transfer engine Gaudí was a great artist 2. Input SL sentence and up to 5 alternative translation with alignments to Translation Correction Tool. 3. Input user correction log file with transfer engine output to RR module variable instantiation. 4. Determine appropriate RR operations that need to apply. 5. Modify grammar and lexicon by applying RR ops. 6. Run MT system again with refined grammar and lexicon. Interactive and Automatic Rule Refinement
Automatic Rule Adaptation Interactive and Automatic Rule Refinement
Automatic Rule Adaptation SL + best TL picked by user Interactive and Automatic Rule Refinement
Automatic Rule Adaptation Changing “grande” into “gran” Interactive and Automatic Rule Refinement
Automatic Rule Adaptation Interactive and Automatic Rule Refinement
Input to RR module • User correction log file • Transfer engine output (+ parse tree): sl: Gaudi was a great artist tl: GAUDI ERA UN ARTISTA GRANDE tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,8 (DET,0:3 "UN") (N,4:5 "ARTISTA") (ADJ,5:4 "GRANDE") ) ) ) )> Interactive and Automatic Rule Refinement
Variable instantiation from log file Correction Actions: 1. Word order change (artista grande grande artista): Wi = grande 2. Edited grande into gran: Wi’ = gran identified artist as clue word Wc = artist In this case, even if user had not identified Wc, refinement process would have been the same Interactive and Automatic Rule Refinement
Retrieve relevant lexical entries ADJ::ADJ |: [great] -> [grande] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc)) N::N |: [artist] -> [artista] ((X1::Y1) ((x0 agr pers) = 3) ((x0 agr num) = sg) ((x0 form) = artist) ((x0 semtype) = human)) Interactive and Automatic Rule Refinement
Add lexical entry for “gran” Duplicate lexical entry great-grande and change TL side: ADJ::ADJ |: [great] -> [gran] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc)) Even if we had morphological analyzer available, no difference between them: grande grande AQ0CS0 grande NCCS000 gran gran AQ0CS0 Lex0 Lex1[Lex0 + TLword] Interactive and Automatic Rule Refinement
Finding triggering feature(s) Feature function: (Wi, Wi’) = need to postulate a new binary feature: feat1 Blame assignment: tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,8 (DET,0:3 "UN") (N,4:5 "ARTISTA") (ADJ,5:4 "GRANDE") ) ) ) )> Interactive and Automatic Rule Refinement
Refining the rules Wi = grande POSi = ADJ =Y3, y3 Wc = artist POSc = N = Y2, y2 {NP,8} NP::NP : [DET ADJ N] -> [DET N ADJ] ( (X1::Y1) (X2::Y3) (X3::Y2) ((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y2 agr)) ; det-noun agreement ((y3 agr) = (y2 agr)) ; adj-noun agreement (y2 = x3) ) Interactive and Automatic Rule Refinement
Refining the rules {NP,1008} NP::NP : [DET ADJ N] -> [DET ADJ N] ( (X1::Y1) (X2::Y2) (X3::Y3) ((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y3 agr)) ; det-noun agreement ((y2agr) = (y3 agr)) ; adj-noun agreement (y2 = x3) ((y2 feat1) =c + ) ) Interactive and Automatic Rule Refinement
Refining the lexical entries ADJ::ADJ |: [great] -> [grande] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc) ((y0 feat1) = -)) ADJ::ADJ |: [great] -> [gran] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc) ((y0 feat1) = +)) Interactive and Automatic Rule Refinement
Done? Not yet • Right now we’ve just increased ambiguity in the grammar: translation candidate list size has increased by more than double, since both “grande” and “gran” can be unified with {NP,8} and “gran” now unifies with {NP,1008}. • Need to restrict application of general rule to just post-nominal ADJ: R0 R1[=R0 + constr= -] = NP,8 (general rule) R2[=R0’ + constr=c +] = NP,1008 (specific rule) Cov[R0] Cov[R1,R2] Interactive and Automatic Rule Refinement
Add blocking constraint {NP,8} NP::NP : [DET ADJ N] -> [DET N ADJ] ( (X1::Y1) (X2::Y3) (X3::Y2) ((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y2 agr)) ; det-noun agreement ((y3 agr) = (y2 agr)) ; adj-noun agreement (y2 = x3) ((y3 feat1) = - ) ) Interactive and Automatic Rule Refinement
Refined MT output sl: Gaudi was a great artist tl: GAUDI ERA UN ARTISTA GRANDE tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,8 (DET,0:3 "UN") (N,4:5 "ARTISTA") (ADJ,5:4 "GRANDE") ) ) ) )> tl: GAUDI ERA UNA ARTISTA GRANDE tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,8 (DET,2:3 "UNA") (N,4:5 "ARTISTA") (ADJ,5:4 "GRANDE") ) ) ) )> tl: GAUDI ERA UN GRAN ARTISTA tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,1008 (DET,0:3 "UN") (ADJ,6:4 "GRAN") (N,4:5 "ARTISTA") ) ) ) )> tl: GAUDI ERA UNA GRAN ARTISTA tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,1008 (DET,2:3 "UNA") (ADJ,6:4 "GRAN") (N,4:5 "ARTISTA") ) ) ) )> … [same for estaba] Interactive and Automatic Rule Refinement