1 / 31

Machine Translation

Machine Translation. A Presentation by: Julie Conlonova, Rob Chase, and Eric Pomerleau. Overview. Language Alignment System Datasets Sentence-aligned sets for training (ex. The Hansards Corpus, European Parliamentary Proceedings Parallel Corpus)

rbean
Download Presentation

Machine Translation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Translation A Presentation by: Julie Conlonova, Rob Chase, and Eric Pomerleau

  2. Overview • Language Alignment System • Datasets • Sentence-aligned sets for training (ex. The Hansards Corpus, European Parliamentary Proceedings Parallel Corpus) • A word-aligned set for testing and evaluation to measure accuracy and precision • Decoding

  3. Language Alignment • Goal: Produce a word-aligned set from a sentence-aligned dataset • First step on the road toward Statistical Machine Translation • Example Problem: • The motion to adjourn the House is now deemed to have been adopted. • La motion portant que la Chambre s'ajourne maintenant est réputée adoptée.

  4. IBM Models 1 and 2-Kevin Knight, A Statistical MT Tutorial Workbook, 1999 • Each capable of being used to produce a word-aligned dataset separately. • EM Algorithm • Model 1 produces T-values based on normalized fractional counting of corresponding words. • Additionally, Model 2 uses A-values for “reverse distortion probabilities” – probabilities based on the positions of the words

  5. Training Data • European Parliament Proceedings Parallel Corpus 1996-2003 • Aligned Languages: • English - French • English - Dutch • English - Italian • English - Finish • English - Portuguese • English - Spanish • English - Greek

  6. Training Data cont. • Eliminated • Misaligned sentences • Sentences with 50 or more words • XML tags • Symbols and numerical characters other then commas and periods

  7. Ideally… http://www.cs.berkeley.edu/~klein/cs294-5

  8. Bypassing Interlingua: Models I-III • Variables contributing to the probability of a sentence: • Correlation between words in the source/target languages • Fertility of a word • Correlation between order of words in source sentence and order of words in target

  9. A Translation Matrix

  10. Building the Translation Matrix: Starting from alignments • Find the sentence alignment • If a word in the source aligns with a word in the target, then increment the translation matrix. • Normalize the translation matrix

  11. Can’t find alignments • Most sentences in the hansards corpus are 60 words long. There are many that can be over 100. • 100100 possible alignments

  12. Counting • Rob is a boy. Rob es nino. • Rob is tall. Rob es alto. • Eric is tall. Eric es alto. … … Base counts on co-occurrence, weighting based on sentence length.

  13. Iterative Convergence • Use Estimation Maximization algorithm • Creates translation matrix

  14. Distorting the Sentence • Word order changes between languages • How is a sentence with 2 words distorted? • How is a sentence with 3 words distorted? • How is a sentence with … To keep track of this information we use…

  15. A tesseract! • (A quadruply nested default dictionary) • This could be a problem if there are more than 100 words in a sentence. • 100x100x100x100 = too big for RAM and takes too much time

  16. Broad Look at MT • “The translation process can be described simply as: • Decoding the meaning of the source text, and • Re-encoding this meaning in the target language.” - “Translation Process”, Wikipedia, May 2006

  17. Decoding • How to go from the T-matrix and A-matrix to a word alignment? • There are several approaches…

  18. Viterbi • If only doing alignment, much smaller memory and time requirements. • Returns optimal path. • T-Matrix probabilities function as the “emission” matrix • A-Matrix probabilities concerned with the positioning of words

  19. Decoding as a Translator Without supplying a translated sentence to the program, it is capable of being a stand-alone translator instead of a word aligner. However, while the Viterbi algorithm runs quickly with pruning for decoding, for translating the run time skyrockets.

  20. Greedy Hill ClimbingKnight & Koehn, What’s New in Statistical Machine Translation, 2003 • Best first search • 2-step look ahead to avoid getting stuck in most probable local maxima

  21. Beam SearchKnight & Koehn, What’s New in Statistical Machine Translation, 2003 • Optimization of Best First Search with heuristics and “beam” of choices • Exponential tradeoff when increasing the “beam” width

  22. Other Decoding MethodsKnight & Koehn, What’s New in Statistical Machine Translation, 2003 • Finite State Transducer • Mapping between languages based on a finite automaton • Parsing • String to Tree Model

  23. Problem: One to Many Necessary to take all alignments over a certain probability in order to capture the “probability that e has fertility at least a given value” Al-Onaizan, Curin, Jahr, etc., Statistical Machine Translation, 1999

  24. Results • Study done in 2003 on word alignment error rates in Hansards corpus: • Model 2 – • 29.3% on 8K training sentence pairs • 19.5% on 1.47M training sentence pairs • Optimized Model 6 – • 20.3% on 8K training sentence pairs • 8.7% on 1.47M training sentence pairs Och and Ney, A Systematic Comparison of Various Statistical Alignment Models, 2003

  25. Expected Accuracy 70% overall • Language performance: • Dutch • French • Italian, Spanish, Portuguese • Greek • Finish

  26. Possible Future Work • Given more time, we would’ve implemented IBM Model 3 • Additionally uses n, p, and d fertilities for weighted alignments: • N, number of words produced by one word • D, distortion • P, parameter involving words that aren’t involved directly • Invokes Model 2 for scoring

  27. Another Possible Translation Scheme • Example-Based Machine Translation • Translation-by-Analogy • Can sometimes achieve better than the “gist” translations from other models

  28. Why Is Improving Machine Translation Necessary?

  29. A Chinese to English Translation

  30. The End Are there any questions/comments?

More Related