Look-up Based Greedy Decoding for Machine Translation

Look-up Based Greedy Decoding for Machine Translation Tony Zhang Steven Bills

Dictionary-based look-up • IBM Models 1 and 2 generate translation probabilities statistically using parallel text • We generate probabilities by looking up translations in freely available online lexical resources: Wikipedia and Wiktionary • These probabilities are often significantly more accurate than those generated by the IBM models

The translation process • Our system translates from English to French • We apply chunking, reordering and stemming in a pre-processing phase • We generate an initial gloss of the pre-processed English sentence by using the most likely translation of each word • We greedily apply mutations to the French translation until no superior mutations are left • We apply post-processing to the resulting translation to remove duplicate words and perform contractions

Example mutations • “the last election” • Initial gloss: “le élection durer” (“the election to last”) • Retranslation mutation: “le élection dernier” (correct) • “parliament is …” • Initial gloss: “parlement est …” (needs an article) • Insertion mutation: “le parlement est” (correct article) • “the speaker wants to try …” • Initial gloss: “le président veut à essayer …” • Deletion mutation: “le président veut essayer …”

Processing • Chunking • Translate expressions as a whole rather than word-by-word when they have entries in our dictionary • “someone else”  “quelqu’un d’autre” • Reordering • Swap nouns and adjectives • “useful organization”  “organisation utile” • Conjugation • Conjugate infinitives to agree with the subject • “the men walk”  “les hommes marcher” (infinitive) • “marcher”  “marchent” (3rd person plural conjugation)

Results • Demonstrated the feasibility of building translation probabilities from online lexical resources • Mutations and pre- and post-processing fixed many of the problems associated with word-by-word replacement • Intelligibility and semantic “closeness” of translations drastically improved over IBM Models 1 and 2

Look-up Based Greedy Decoding for Machine Translation

Look-up Based Greedy Decoding for Machine Translation

Presentation Transcript

Machine Translation

Machine Translation

Web-Based Machine Translation

Machine Translation

Machine Translation

Rules Based Machine Translation

SYNTAX BASED MACHINE TRANSLATION

Dependency-Based Automatic Evaluation for Machine Translation

Statistical Machine Translation Part III – Phrase- based SMT / Decoding

Morphological Analysis for Phrase-Based Statistical Machine Translation

Example-Based Machine Translation

A Path-based Transfer Model for Machine Translation

Machine Translation Decoder for Phrase-Based SMT

Machine Translation

Example-based Machine Translation

Wrapper Syntax for Example-Based Machine Translation

Morphological Analysis for Phrase-Based Statistical Machine Translation

Machine Translation Decoder for Phrase-Based SMT

Machine Translation

Machine Translation, Free Machine Translation