1 / 6

Look-up Based Greedy Decoding for Machine Translation

Look-up Based Greedy Decoding for Machine Translation. Tony Zhang Steven Bills. Dictionary-based look-up. IBM Models 1 and 2 generate translation probabilities statistically using parallel text

artie
Download Presentation

Look-up Based Greedy Decoding for Machine Translation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Look-up Based Greedy Decoding for Machine Translation Tony Zhang Steven Bills

  2. Dictionary-based look-up • IBM Models 1 and 2 generate translation probabilities statistically using parallel text • We generate probabilities by looking up translations in freely available online lexical resources: Wikipedia and Wiktionary • These probabilities are often significantly more accurate than those generated by the IBM models

  3. The translation process • Our system translates from English to French • We apply chunking, reordering and stemming in a pre-processing phase • We generate an initial gloss of the pre-processed English sentence by using the most likely translation of each word • We greedily apply mutations to the French translation until no superior mutations are left • We apply post-processing to the resulting translation to remove duplicate words and perform contractions

  4. Example mutations • “the last election” • Initial gloss: “le élection durer” (“the election to last”) • Retranslation mutation: “le élection dernier” (correct) • “parliament is …” • Initial gloss: “parlement est …” (needs an article) • Insertion mutation: “le parlement est” (correct article) • “the speaker wants to try …” • Initial gloss: “le président veut à essayer …” • Deletion mutation: “le président veut essayer …”

  5. Processing • Chunking • Translate expressions as a whole rather than word-by-word when they have entries in our dictionary • “someone else”  “quelqu’un d’autre” • Reordering • Swap nouns and adjectives • “useful organization”  “organisation utile” • Conjugation • Conjugate infinitives to agree with the subject • “the men walk”  “les hommes marcher” (infinitive) • “marcher”  “marchent” (3rd person plural conjugation)

  6. Results • Demonstrated the feasibility of building translation probabilities from online lexical resources • Mutations and pre- and post-processing fixed many of the problems associated with word-by-word replacement • Intelligibility and semantic “closeness” of translations drastically improved over IBM Models 1 and 2

More Related