Enriched translation model using morphology in MT

Enriched translation model using morphology in MT Luong Minh Thang WING group meeting – 07 July, 2009

Overview • Brief recap on SMT & morphological analysis • Motivation • Enriched translation model • Twin phrase-table construction • Merging phrase tables • Experiments • Conclusion

SMT overview – alignment • Parallel data Ensinnäkin kohtaamiemme taloudellisten ja sosiaalisten vaikeuksien vuoksi on havaittavissa huolestumista , vaikka kasvu on kestävällä pohjalla ja tulosta vuosien ponnisteluista , kaikkien kansalaistemme taholta . These are , first and foremost , messages of concern at the economic and social problems that we are experiencing , in spite of a period of sustained growth stemming from years of efforts by all our fellow citizens . • Alignment: one-to-many (1-M) Source Target

SMT overview – translation model • Intersect alignment 1-M + M-1  M – M • Extracting phrases from M-M alignment  translation model (phrase table). problems ||| ongelmat ||| 0.372611 0.597858 0.114146 0.13882 2.718 problems ||| ongelmasta ||| 0.352941 0.423077 0.000836237 0.0012435 2.718 … problems ||| vaikeuksista ||| 0.0696946 0.105991 0.0124042 0.0130002 2.718 problems ||| vaikeuksien ||| 0.0410959 0.062069 0.000836237 0.0010174 2.718 Translation probabilities Lexical probabilities Phrase penalty Foreign f English e

Recap - Morphological analysis • Morpheme: minimal meaning-bearing unit English: machine + s, present + ed, etc. Finnish: oppositio + kansa + n + edusta + ja = opposition of parliament member • Morfessor (Creutz & Lagus, 2007): segment words, unsupervised manner un/PRE + fortunate/STM + ly/SUF

Motivation • Problem: • Multiple word forms in morphology-complex language, e.g. ongelmat, ongelmasta, etc. • Rare words often occur and are hard to align  incorrect entries in normal (word-align) phrase table. • Solution: • Construct morpheme-align phrase table (PT) to aggregate better statistics for rare words. • Combine word- and morpheme-align PTs to produce even better translation model in a proper way.

Twin phrase-table (PT) construction Word Morpheme GIZA++ GIZA++ Word alignment Morpheme alignment Phrase Extraction Phrase Extraction problem/STM+ s/SUF ||| ongelma/STM+ t/SUF problems ||| vaikeuksista PTm PTw Morphological segmentation PTwm problem/STM+ s/SUF ||| vaikeu/STM+ ksi/SUF+ sta/SUF PT merging Decoding

Existing PT-merging methods • Add-feature - (Nakov, 2008; Chen et. al. 2009): F1 = F2 = F3 =  heuristic-driven • Interpolation - (Wu & Wang, 2007) : • tran(f|e) = α * tran1(f|e) + (1- α) * tran2(f|e) • lex(f|e) = β * lex1(f|e) + (1- β) * lex2(f|e)  not consider score “meaning” 1 if from 1st PT 1 if from 2nd PT 1 if from both PTs 0.5 otherwise 0.5 otherwise 0.5 otherwise

Our merging method – normalizing translation probabilities problem + s ||| ongelma + t problem + s ||| ongelma + t problem + s ||| ongelma + t problem + s ||| vaikeu + ksi + sta problem + s ||| vaikeu + ksi + sta problem + s ||| ongelma + sta PTwm PTm MLE tran1(e|f) =count1(e, f) / ∑e count1(e, f) tran2(e|f) =count2(e, f) / ∑e count2(e, f)

Our merging method – normalizing translation probabilities problem + s ||| ongelma + t problem + s ||| ongelma + t problem + s ||| ongelma + t problem + s ||| vaikeu + ksi + sta problem + s ||| vaikeu + ksi + sta problem + s ||| ongelma + sta PTwm PTm MLE tran(vaikeuksista | problems) =1/2=0.5 tran(ongelmasta | problems) =1/2=0.5 tran(ongelmat | problems) = 3/4 = 0.75 tran(vaikeuksista | problems) = 1/4 = 0.25 Interpolation (ratio = 0.5) tran(vaikeuksista | problems) = (0.5 + 0.25)/2 = 0.375 tran(ongelmat | problems) = (0 + 0.75)/2 = 0.375 tran(ongelmasta | problems) = (0.5 + 0)/2 = 0.25 Undesired translation!

Our merging method – normalizing translation probabilities problem + s ||| ongelma + t problem + s ||| ongelma + t problem + s ||| ongelma + t problem + s ||| vaikeu + ksi + sta problem + s ||| vaikeu + ksi + sta problem + s ||| ongelma + sta PTwm PTm MLE tran1(e|f) =count1(e, f) / ∑e count1(e, f) tran2(e|f) =count2(e, f) / ∑e count2(e, f) Normalization tran(e|f) =[ count1(e, f) + count2(e, f)] / [ ∑e count1(e, f) + ∑e count2(e, f) ]

Our merging method – normalizing translation probabilities problem + s ||| ongelma + t problem + s ||| ongelma + t problem + s ||| ongelma + t problem + s ||| vaikeu + ksi + sta problem + s ||| vaikeu + ksi + sta problem + s ||| ongelma + sta PTwm PTm MLE tran(vaikeuksista | problems) =1/2=0.5 tran(ongelmasta | problems) =1/2=0.5 tran(ongelmat | problems) = 3/4 = 0.75 tran(vaikeuksista | problems) = 1/4 = 0.25 Normalization tran(vaikeuksista | problems) = (1 + 1)/(2+4) = 0.33 tran(ongelmat | problems) = (0 + 3)/(2 + 4) = 0.5 tran(ongelmasta | problems) = (1 + 0)/(2 + 4) = 0.17 Desired translation!

Experiments – dataset • 2005 ACL shared task (Koehn & Monz, 2005)

Experiments – baselines • w-system: uses PTw translate at word-level • m-system: uses PTm translate at morpheme-level • m-BLEU: BLEU where each token unit is a morpheme

Experiments – our system • Improvements over m-system and w-system are statistically significant using sign test by (Collins et al. 2005)

Conclusion Our contributions: • Enrich the translation model without using additional data. • Propose a principal way to merge phrase tables generated at different granularities.

Q & A • Thank you !!!

Enriched translation model using morphology in MT

Enriched translation model using morphology in MT

Presentation Transcript

A Service-Enriched Supportive Housing Model

Generalising lexical translation strategies for MT using comparable corpora

Statistical Translation Language Model

Conceptual Model Translation

Using Verbal Reports in Translation Research

Enriched English

Coupling between ASR and MT in Speech-to-Speech Translation

Machine Translation Distortion Model

Enriched Name_______________________________

Coupling between ASR and MT in Speech-to-Speech Translation

Machine Translation MT – Research Landscape

Enriched Cages

Machine Translation (MT)

“Applying Morphology Generation Models to Machine Translation”

Learning to Generate Complex Morphology for Machine Translation

Using corpora in translation studies

Using LED Tubes Integrated Enriched Lighting Results

Global Machine Translation (MT) Market

Model Translation and Cleanup