Machine Translation

Machine Translation A Presentation by: Julie Conlonova, Rob Chase, and Eric Pomerleau

Overview • Language Alignment System • Datasets • Sentence-aligned sets for training (ex. The Hansards Corpus, European Parliamentary Proceedings Parallel Corpus) • A word-aligned set for testing and evaluation to measure accuracy and precision • Decoding

Language Alignment • Goal: Produce a word-aligned set from a sentence-aligned dataset • First step on the road toward Statistical Machine Translation • Example Problem: • The motion to adjourn the House is now deemed to have been adopted. • La motion portant que la Chambre s'ajourne maintenant est réputée adoptée.

IBM Models 1 and 2-Kevin Knight, A Statistical MT Tutorial Workbook, 1999 • Each capable of being used to produce a word-aligned dataset separately. • EM Algorithm • Model 1 produces T-values based on normalized fractional counting of corresponding words. • Additionally, Model 2 uses A-values for “reverse distortion probabilities” – probabilities based on the positions of the words

Training Data • European Parliament Proceedings Parallel Corpus 1996-2003 • Aligned Languages: • English - French • English - Dutch • English - Italian • English - Finish • English - Portuguese • English - Spanish • English - Greek

Training Data cont. • Eliminated • Misaligned sentences • Sentences with 50 or more words • XML tags • Symbols and numerical characters other then commas and periods

Ideally… http://www.cs.berkeley.edu/~klein/cs294-5

Bypassing Interlingua: Models I-III • Variables contributing to the probability of a sentence: • Correlation between words in the source/target languages • Fertility of a word • Correlation between order of words in source sentence and order of words in target

A Translation Matrix

Building the Translation Matrix: Starting from alignments • Find the sentence alignment • If a word in the source aligns with a word in the target, then increment the translation matrix. • Normalize the translation matrix

Can’t find alignments • Most sentences in the hansards corpus are 60 words long. There are many that can be over 100. • 100100 possible alignments

Counting • Rob is a boy. Rob es nino. • Rob is tall. Rob es alto. • Eric is tall. Eric es alto. … … Base counts on co-occurrence, weighting based on sentence length.

Iterative Convergence • Use Estimation Maximization algorithm • Creates translation matrix

Distorting the Sentence • Word order changes between languages • How is a sentence with 2 words distorted? • How is a sentence with 3 words distorted? • How is a sentence with … To keep track of this information we use…

A tesseract! • (A quadruply nested default dictionary) • This could be a problem if there are more than 100 words in a sentence. • 100x100x100x100 = too big for RAM and takes too much time

Broad Look at MT • “The translation process can be described simply as: • Decoding the meaning of the source text, and • Re-encoding this meaning in the target language.” - “Translation Process”, Wikipedia, May 2006

Decoding • How to go from the T-matrix and A-matrix to a word alignment? • There are several approaches…

Viterbi • If only doing alignment, much smaller memory and time requirements. • Returns optimal path. • T-Matrix probabilities function as the “emission” matrix • A-Matrix probabilities concerned with the positioning of words

Decoding as a Translator Without supplying a translated sentence to the program, it is capable of being a stand-alone translator instead of a word aligner. However, while the Viterbi algorithm runs quickly with pruning for decoding, for translating the run time skyrockets.

Greedy Hill ClimbingKnight & Koehn, What’s New in Statistical Machine Translation, 2003 • Best first search • 2-step look ahead to avoid getting stuck in most probable local maxima

Beam SearchKnight & Koehn, What’s New in Statistical Machine Translation, 2003 • Optimization of Best First Search with heuristics and “beam” of choices • Exponential tradeoff when increasing the “beam” width

Other Decoding MethodsKnight & Koehn, What’s New in Statistical Machine Translation, 2003 • Finite State Transducer • Mapping between languages based on a finite automaton • Parsing • String to Tree Model

Problem: One to Many Necessary to take all alignments over a certain probability in order to capture the “probability that e has fertility at least a given value” Al-Onaizan, Curin, Jahr, etc., Statistical Machine Translation, 1999

Results • Study done in 2003 on word alignment error rates in Hansards corpus: • Model 2 – • 29.3% on 8K training sentence pairs • 19.5% on 1.47M training sentence pairs • Optimized Model 6 – • 20.3% on 8K training sentence pairs • 8.7% on 1.47M training sentence pairs Och and Ney, A Systematic Comparison of Various Statistical Alignment Models, 2003

Expected Accuracy 70% overall • Language performance: • Dutch • French • Italian, Spanish, Portuguese • Greek • Finish

Possible Future Work • Given more time, we would’ve implemented IBM Model 3 • Additionally uses n, p, and d fertilities for weighted alignments: • N, number of words produced by one word • D, distortion • P, parameter involving words that aren’t involved directly • Invokes Model 2 for scoring

Another Possible Translation Scheme • Example-Based Machine Translation • Translation-by-Analogy • Can sometimes achieve better than the “gist” translations from other models

Why Is Improving Machine Translation Necessary?

A Chinese to English Translation

The End Are there any questions/comments?

Machine Translation