860 likes | 1.05k Views
PhD Thesis:. Linguistically Motivated Reordering Modeling for Phrase-Based Statistical Machine Translation. Arianna Bisazza Advisor: Marcello Federico . Fondazione Bruno Kessler / Università di Trento. PSMT decoding overview.
E N D
PhD Thesis: Linguistically Motivated Reordering Modeling for Phrase-Based Statistical Machine Translation Arianna Bisazza Advisor: Marcello Federico Fondazione Bruno Kessler/ Università di Trento
PSMT decoding overview E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali 2 Arianna Bisazza – PhD Thesis – 19 April 2013
PSMT decoding overview ReoM scores ReoM scores E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali TM scores TM scores Freedom of movement must be encouraged LM scores LM scores 3 Arianna Bisazza – PhD Thesis – 19 April 2013
PSMT decoding overview ReoM scores ReoM scores ReoM scores ReoM scores ReoM scores E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali TM scores TM scores TM scores TM scores career paths while ensuring that … Freedom of movement must be encouraged LM scores LM scores LM scores LM scores 4 Arianna Bisazza – PhD Thesis – 19 April 2013
PSMT decoding overview E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali ReoM scores ReoM scores ReoM scores ReoM scores ReoM scores Freedom of movement must be encouraged while ensuring that career paths TM scores TM scores TM scores TM scores … LM scores LM scores LM scores LM scores 5 Arianna Bisazza – PhD Thesis – 19 April 2013
Reordering Models Tillman 04,Zens & Ney 06 Al Onaizan & Papineni 06 Galley & Manning 08 Green & al.10, Feng & al.10 … Many solutions have been proposed with different reo. classes, features, train modes, etc. E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali ReoM scores ReoM scores ReoM scores ReoM scores ReoM scores 6 Arianna Bisazza – PhD Thesis – 19 April 2013
Reordering Models Tillman04, Zens&Ney06 AlOnaizan & Papineni06 Galley & Manning08 Green &al.10, Feng &al.10 … Tillman 04,Zens & Ney 06 Al Onaizan & Papineni 06 Galley & Manning 08 Green & al.10, Feng & al.10 … Many solutions have been proposed with different reo. classes, features, train modes, etc. ReoM scores ReoM scores ReoM scores ReoM scores ReoM scores E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali No matter what reordering model is used, the permutation search space must be limited! The power of all reordering models is bound to the reordering constraints in use 7 Arianna Bisazza – PhD Thesis – 19 April 2013
ReoM scores E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali 8 Arianna Bisazza – PhD Thesis – 19 April 2013
E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali Reordering Constraints #perm = |w|! ≈40,000,000 9 Arianna Bisazza – PhD Thesis – 19 April 2013
E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali Reordering Constraints #perm = |w|! ≈40,000,000 D(wx,wy)=|y-x-1| Source-to-Source distortion 10 Arianna Bisazza – PhD Thesis – 19 April 2013
E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali DL: distortion limit Reordering Constraints #perm = |w|! ≈40,000,000 D(wx,wy)=|y-x-1| DL=3 #perm ≈7,000 Source-to-Source distortion 11 Arianna Bisazza – PhD Thesis – 19 April 2013
The problem with DL… Arabic-English EN EN AR AR 12 Arianna Bisazza – PhD Thesis – 19 April 2013
The problem with DL… German-English EN EN DE DE 13 Arianna Bisazza – PhD Thesis – 19 April 2013
#perm = |w|! ≈40,000,000 D(wx,wy)=|y-x-1| DL=3 #perm ≈7,000 Current solution Increasing the DLimit! Source-to-Source distortion 14 Arianna Bisazza – PhD Thesis – 19 April 2013
#perm = |w|! ≈40,000,000 D(wx,wy)=|y-x-1| DL=3 #perm ≈7,000 DL=7 #perm ≈7,000,000 Current solution Increasing the DLimit! • Coarse reordering • space definition: • slower decoding • worse translations Source-to-Source distortion 15 Arianna Bisazza – PhD Thesis – 19 April 2013
Observations • Word reordering is difficult! • The existing word reordering models are not perfect, but they are expected to guide search over huge search spaces one way to go: our way: • design a perfect model • problem: many have already tried and failed • simplify the task for the existing reordering models 16 Arianna Bisazza – PhD Thesis – 19 April 2013
Working hypotheses • A better definition of the reordering search space (i.e. constraints) can simplify the task of the reordering model • (Shallow) linguistic knowledge can help us to refine the reordering search space for a given language pair 17 Arianna Bisazza – PhD Thesis – 19 April 2013
Outline • The problem • The solutions: • verbreorderinglattices • modifieddistortionmatrices • dynamicallypruning the reordering space • Comparative evaluation & conclusions 18 Arianna Bisazza – PhD Thesis – 19 April 2013
Outline Bisazza and Federico, Chunk-based Verb Reordering in VSO Sentences for Arabic-English, WMT 2010 • The problem • The solutions: • verbreorderinglattices • modifieddistortionmatrices • dynamicallypruning the reordering space • Comparative evaluation & conclusions Bisazza, Pighin, Federico, Chunk-Lattices for Verb Reordering in Arabic-English Statistical Machine Translation, MT Journal 2012 19 Arianna Bisazza – PhD Thesis – 19 April 2013
Idea: keep a low distortion limit and … #perm = |w|! ≈40,000,000 D(wx,wy)=|y-x-1| DL=3 #perm ≈7,000 DL=7 #perm ≈7,000,000 … modify the input to allow only specific long reorderings Source-to-Source distortion 20 Arianna Bisazza – PhD Thesis – 19 April 2013
Reordering patterns in Arabic-English • Example of VSO sentences: • the Arabicverbisanticipatedwrt the English order • Typical PSMT outputs: • *The Moroccan monarch King Mohamed VI __ his support to… • *He renewed the Moroccan monarch King Mohamed VI his support to… 21 Arianna Bisazza – PhD Thesis – 19 April 2013
Working hypothesis • Uneven distribution of long and short-range word movements: • few long: • verb-subject-object sentences • many short: • adjective-noun • head-initial genitive constructions (idafa) We try to model them explicitly! We assume they are well handled in standard PSMT 22 Arianna Bisazza – PhD Thesis – 19 April 2013
Chunk-based fuzzy reordering rules • Shallow syntax chunking: • cheaper and easier than deep parsing • constrains reorderings in a softer way • Fuzzy(non-determinisic) reordering rules: • generate N permutations for each matching sequence • final reordering decision is taken during translation, • guided by all SMT models (reoM, LM...) • Few rules for language pair, to only capture long reordering 23 Arianna Bisazza – PhD Thesis – 19 April 2013
Chunk-based fuzzy reordering rules Move verb chunk ahead by 1 to N chunks … CH(*) CH(V) CH(*) CH(*) CH(*) CH(*) CH(*) … CH(V) CH(*) … CH(*) CH(*) CH(*) CH(*) … CH(*) Move verb chunk and following chunk ahead by 1 to N chunks 24 Arianna Bisazza – PhD Thesis – 19 April 2013
Chunk-based verb reordering in parallel data The optimal reordering is the one that minimizes total distortion 25 Arianna Bisazza – PhD Thesis – 19 April 2013
Chunk-based verb reordering in test data Move verb chunk Move verb chunk andfollowing chunk • Verb chunk • Other chunks 26 Arianna Bisazza – PhD Thesis – 19 April 2013
Experiments • Task: NIST-MT09 (news translation) • Systems based on Moses, include lexicalized phrase reordering models [Tillmann 04; Koehn & al 05] • Non-monotonic lattice decoding [Dyer & al 08] • Evaluation by • - BLEU [Papineni & al 01] for lexical match & local order • - KRS [Birch & al 10]for global order 27 Arianna Bisazza – PhD Thesis – 19 April 2013
Arabic-English: Translation Quality +0.5 BLEU +0.4 KRS Test set: eval09-nw Lattices always used with pre-ordered training Oracle: test pre-ordered looking at reference (more details on lattice pruning in the thesis) 28 Arianna Bisazza – PhD Thesis – 19 April 2013
Arabic-English: -0.1 BLEU -0.3 KRS Translation Quality Translation Time Decoding Pruning Test set: eval09-nw Lattices always used with pre-ordered training Oracle: test pre-ordered looking at reference (more details on lattice pruning in the thesis) 29 Arianna Bisazza – PhD Thesis – 19 April 2013
Lessons learned limiting long reordering of a few chunks only use lattice to represent extra reordering decoding slow down Can we do better? Observation: lattice topology basically distorts word-to-word distances, i.e. during decoding some distant positions become closer Can we achieve the same effect more directly? 30 Arianna Bisazza – PhD Thesis – 19 April 2013
Outline • The problem • The solutions: • verbreorderinglattices • modifieddistortionmatrices • dynamicallypruning the reordering space • Comparative evaluation & conclusions Bisazza and Federico, Modified Distortion Matrices for Phrase-Based Statistical Machine Translation, ACL 2012 31 Arianna Bisazza – PhD Thesis – 19 April 2013
#perm = |w|! ≈40,000,000 D(wx,wy)=|y-x-1| DL=3 #perm ≈7,000 DL=7 #perm ≈7,000,000 Source-to-Source distortion 32 Arianna Bisazza – PhD Thesis – 19 April 2013
#perm = |w|! ≈40,000,000 D(wx,wy)=|y-x-1| DL=3 #perm ≈7,000 DL=7 #perm ≈7,000,000 DL=3 & modif(D) #perm ≈20,000 Idea: modify the distortion matrix for each test sentence! Refined reordering search space Source-to-Source distortion 33 Arianna Bisazza – PhD Thesis – 19 April 2013
Chunk-based fuzzy reordering rules Arabic-English “Move verb chunk (and following chunk) to the right by 1 to N chunks” w- $ArkfyAltZAhrpE$rAtAlmslHynmnAlktA}b . and took part in the march dozens of militants from the Brigades CC1 VC2 PC3 NC4 PC5 Pct6 34 Arianna Bisazza – PhD Thesis – 19 April 2013
Chunk-based fuzzy reordering rules Arabic-English “Move verb chunk (and following chunk) to the right by 1 to N chunks” w- $ArkfyAltZAhrpE$rAtAlmslHynmnAlktA}b . and took part in the march dozens of militants from the Brigades CC1 VC2 PC3 NC4 PC5 Pct6 Pct6 CC1 PC3 VC2 NC4 PC5 NC4 Pct6 PC5 CC1 PC3 VC2 PC3 NC4 PC5 VC2 CC1 Pct6 35 Arianna Bisazza – PhD Thesis – 19 April 2013
Chunk-based fuzzy reordering rules Arabic-English “Move verb chunk (and following chunk) to the right by 1 to N chunks” w- $ArkfyAltZAhrpE$rAtAlmslHynmnAlktA}b . and took part in the march dozens of militants from the Brigades CC1 VC2 PC3 NC4 PC5 Pct6 Pct6 CC1 PC3 VC2 NC4 PC5 NC4 Pct6 PC5 CC1 PC3 VC2 PC3 NC4 PC5 VC2 CC1 Pct6 CC1 NC4 VC2 PC3 PC5 Pct6 CC1 NC4 PC5 VC2 PC3 Pct6 36 Arianna Bisazza – PhD Thesis – 19 April 2013
Chunk-based fuzzy reordering rules Reordered source LM Reordering selection w- $ArkfyAltZAhrpE$rAtAlmslHynmnAlktA}b . and took part in the march dozens of militants from the Brigades CC1 VC2 PC3 NC4 PC5 Pct6 0.7 Pct6 CC1 PC3 VC2 NC4 PC5 NC4 Pct6 PC5 CC1 PC3 VC2 0.1 PC3 NC4 PC5 VC2 CC1 Pct6 0.1 CC1 NC4 VC2 PC3 PC5 Pct6 0.4 0.9 CC1 NC4 PC5 VC2 PC3 Pct6 37 Arianna Bisazza – PhD Thesis – 19 April 2013
Chunk-based fuzzy reordering rules Reordered source LM Reordering selection w- $ArkfyAltZAhrpE$rAtAlmslHynmnAlktA}b . and took part in the march dozens of militants from the Brigades CC1 VC2 PC3 NC4 PC5 Pct6 0.7 Pct6 CC1 PC3 VC2 NC4 PC5 0.1 0.1 Reorderings to include in the distortion matrix 0.4 0.9 CC1 NC4 PC5 VC2 PC3 Pct6 38 Arianna Bisazza – PhD Thesis – 19 April 2013
Modifying the distortion matrix Pct6 CC1 PC3 VC2 NC4 PC5 Reorderings to include in the distortion matrix CC1 NC4 PC5 VC2 PC3 Pct6 39 Arianna Bisazza – PhD Thesis – 19 April 2013
Modifying the distortion matrix Pct6 CC1 PC3 VC2 NC4 PC5 Reorderings to include in the distortion matrix CC1 NC4 PC5 VC2 PC3 Pct6 40 Arianna Bisazza – PhD Thesis – 19 April 2013
Modifying the distortion matrix Pct6 CC1 PC3 VC2 NC4 PC5 Reorderings to include in the distortion matrix CC1 NC4 PC5 VC2 PC3 Pct6 41 Arianna Bisazza – PhD Thesis – 19 April 2013
Modifying the distortion matrix Pct6 CC1 PC3 VC2 NC4 PC5 Reorderings to include in the distortion matrix CC1 NC4 PC5 VC2 PC3 Pct6 42 Arianna Bisazza – PhD Thesis – 19 April 2013
Modifying the distortion matrix Pct6 CC1 PC3 VC2 NC4 PC5 Reorderings to include in the distortion matrix CC1 NC4 PC5 VC2 PC3 Pct6 43 Arianna Bisazza – PhD Thesis – 19 April 2013
Modifying the distortion matrix Pct6 CC1 PC3 VC2 NC4 PC5 Reorderings to include in the distortion matrix CC1 NC4 PC5 VC2 PC3 Pct6 44 Arianna Bisazza – PhD Thesis – 19 April 2013
Modifying the distortion matrix Pct6 CC1 PC3 VC2 NC4 PC5 Reorderings to include in the distortion matrix CC1 NC4 PC5 VC2 PC3 Pct6 45 Arianna Bisazza – PhD Thesis – 19 April 2013
Modifying the distortion matrix Pct6 CC1 PC3 VC2 NC4 PC5 Reorderings to include in the distortion matrix CC1 NC4 PC5 VC2 PC3 Pct6 46 Arianna Bisazza – PhD Thesis – 19 April 2013
Modifying the distortion matrix “ w- $ArkfyAltZAhrpE$rAt AlmslHynmnAlktA}b . ” Decoder input 47 Arianna Bisazza – PhD Thesis – 19 April 2013
Experiments • Tasks: NIST-MT09 for Ar-En, WMT10 for De-En • Systems based on Moses, include state-of-the-art hierarchical lexicalized reordering models [Tillmann 04; Koehn & al 05; Galley & Manning 08] • Baseline Distortion Limits: 5 in Ar-En, 10 in De-En • Evaluation by: • - BLEU for lexical match & local order • - KRS for global order 48 Arianna Bisazza – PhD Thesis – 19 April 2013
Arabic-English: +0.9 BLEU +0.6 KRS Translation Quality Translation Time Test set: eval09-nw Distortion modified with 3-best reorderings per rule-matching sequence 49 Arianna Bisazza – PhD Thesis – 19 April 2013
German-English: +0.5 BLEU +0.7 KRS Translation Quality Translation Time Test set: newstest10 Distortion modified with 3-best reorderings per rule-matching sequence 50 Arianna Bisazza – PhD Thesis – 19 April 2013