“Applying Morphology Generation Models to Machine Translation”

“Applying Morphology Generation Models to Machine Translation” By Kristina Toutanova, Hisami Suzuki, Achim Ruopp (Microsoft Research). UW Machine Translation Reading Group, 19th May 2008

Meta-Motivation • Machine Translation is a collection of subproblems: alignment (corpus, sentence, word/phrase), reordering, phrase extraction, language modeling, transliteration, capitalization, etc etc. • It’s hard to work on just a sub-problem in Machine Translation and have those gains translate (har!) over to the overall system performance. • A side goal is to work on independent, portable, modules in the MT system.

Motivation • Many languages have morphological inflection to express agreement, gender, case, etc. English… not so much. • Shows up in surface form of a word:prefix + stem + suffix (more or less. let’s not talk about infixes and circumfixes) • Standard difficulty: data sparseness.(see fewer of each token)

Morphology in MT • It’s problematic when morphological information in one half of a language pair is not present in the other half. • Depending on the translation direction, you either have “extra” information that you need to learn to ignore, (easy)oryou need to generate this extra information somehow (hard)

Too much morphology • Easy hack: PRE-PROCESS! • Strip out gender, split compounds, segment clitics– use as much perl as it takes.

Not enough morphology • Current approaches mostly use a rich language model on the target side. • Downside: just rescoring MT system output, not actually affecting the options. • Factored translation: fold the morphological generation into the translation model– do it all during decoding. • Downside: computationally expensive, so have to prune search space heavily– too much?

Was gibt es neues? • The approach in this paper is to treat morphological inflection as a standalone (post)process:first, decode the input.then, for the sequence of word stems in the output,generate the most likely sequence of inflections given the original input. • Experiments: English  Russian (1.6M sentence pairs), English  Arabic (460k sentence pairs)

Inflection prediction • Lexicon determines three operations: • Stemming: produce set of x possible stems S_w = {s^x} for a word w • Inflection: produce set of y surface word formsI_w = {i^y} for the set of stems S_w • Morphological Analysis: produce set of z morph. analyses A_w = {a^z} for a word w.each a is a vector of categorical values (POS, gender, etc).

Morphological analysis • Morphological features:7 for Russian (including capitalization!)12 for Arabic • Each word can be factored into a stem + a subset of the morph features. • Average 14 inflections / stem in Russian, 24 / stem in Arabic (!).

How do you get them? • Arabic: Buckwalter analyzer • Russian: Off-the-shelf lexicon • Neither is exact, neither is domain-specific; there could be errors here. • (Curse of MT: error propagation)

Models • Prob of an inflection is product of local probabilities for each word, conditioned on context window (prior predictions): • 5gram Russian, 3gram Arabic. • Unlike the morphological analyzer which is just word-based, the inflection model can use arbitrary features/dependencies (such as projected treelet syntactic information)

Inflection Prediction Features • Binary • Pairs up the context (x, y_(t-1), y_(t-2),…)with the target label (y_t) • Features can be anything!

Baseline experiments • Stemmed the reference translations, try to predict the inflections. • Done on 5k Russian sentences,1k Arabic (why?) • Very good accuracy (91% +) • Better than trigram LM (but how about 5gram for Russian?)

MT systems used • 1. The Microsoft treelet translation system • 2. Pharaoh reimplementation • Trained on the MS corpus of technical manuals

Experiments: • Translations are selected by the translation model, language model, and inflection model as follows: • For each hypothesis in the n-best MT system output, select the best inflection. • Then, for each input sentence, select the best inflected hypothesis.

Nbest lists • Only tested up to n=100, but then optimized n and the interpolation weights via grid search. • Optimum size of nbest list: • Russian: 32Arabic: 2 • (!!)

Experiments • 1. Train a regular MT system. stem the output, and re-inflect. • 2. Train MT system, but stem the target language after alignment. The system output is now just word stems, so inflect. • 3. Stem the parallel corpus, then train an MT system.

“Applying Morphology Generation Models to Machine Translation”