370 likes | 487 Views
Statistical Machine Translation Phrase Based Model. Advanced Signal Processing 05/06 Reinisch Bernhard. Overview. The quality of the MT systems have improved with the use of phrase translation Phrases from word-based alignments Syntactic phrases Phrases from phrase alignments
E N D
Statistical Machine Translation Phrase Based Model Advanced Signal Processing 05/06 Reinisch Bernhard
Overview • The quality of the MT systems have improved with the use of phrase translation • Phrases from word-based alignments • Syntactic phrases • Phrases from phrase alignments • IBM word-based statistical MT systems enhanced with phrase translation • Best to extract phrase translations pairs? • Evaluation Framework / Outcome
Word based approaches • Try to model word-to-word correspondences • Models are often restricted • source word -> exactly one target word • Hidden Markov models in speech recognition • Enhanced to “One-to-many” alignment model • Solve lexical problems like • “Zahnarzttermin” -> “dentist’s appointment” • Order of words will be changed
Statistical machine translation (1) • argmax … search/decoding problem (generation of the output sentence) • Pr(e1) … language model • Pr(f1|e1) … translation model
Statistical machine translation (2) Taken from [2]
Learning translation lexica • Following describes methods for learning single-word and phrase-based translation lexica • Statistical alignment models • Used for learning word alignments • Symmetrization • Bilingual phrases • Alignment templates
Statistical alignment models (1) • In the alignment model • A “hidden” parameter is introduced a • a describes the mapping from source position j to target position aj • “a” is represented as a matrix with binary values • 1 entry … words are aligned • 0 entry … words are not aligned • source word -> no target word (empty word eo)
Statistical alignment models (2) • In general the model depends on a set of unknown parameters • Exist several different specific statistical alignment models • First compute word alignments i.e. model 4 • Train this hidden parameters θ • Alignment with highest probability • called Viterbi alignment
Symmetrization (1) • Baseline alignment model (i.e. model 4) does not allow multiple target words • “Zahnarzttermin” -> “dentist’s appointment” • Outcome should be such alignment matrix Taken from [2]
Symmetrization (2) • To solve this problem • Training in both directions • For a sentence pair -> two Viterbi alignments • Now both alignments tables A1 and A2 have to combined (symmetized) • Simple union of both tables (some refined methods) • Result then is used to train single word based translation lexica
Symmetrization (2) • By computing for relative frequencies using: • N(e|f) … how many times e and f are aligned • N(f) … how many time the word f occurs
Bilingual phrases • Now we need an algorithm that relationships between whole phrases of source sentence m and target sentence n • “phrase extract” algorithm and take as input alignment matrix A Taken from [2]
Alignment templates (1) • A more systematic approach • Considers whole phrases • Whole group of adjacent words in the source • maps to a whole group of words in the target • The context of words have greater influence • The changes of word order can be learned • The Idea is to model two different alignment levels • Word level alignments • Phrase level alignments
Alignment templates (2) • Alignments templates z • “F”… source class sequence • “E”…target class sequence • “A”… describes the alignment between source and target • “F” and “E” are classes • The advantage is a better generalization
Alignment templates (3) Taken from [2]
Alignment templates (4) • For the training we need the probability of applying an alignment template • The “phrase extraction” have to be modified • Can be estimated by relative frequencies • Finished the “Learning translation lexica”-task
Translation model (1) • For notation we decompose the sentences • f1J…source sentence • e1I…target sentence • sequence of phrases (k=1,…,K) • Further considerations (only one segmentation)
Translation model (2) • The model have to allow reordering of the phrases
Translation model (3) Taken from [2]
Translation model (4) Taken from [2]
Alignment template approach results • Evaluation of the approach by a translation task (“Verbmobil Task”) • Additional preprocessing • word-joinings • word-splitting Taken from [2] Taken from [2]
Alignment template approach conclusions • Overall we see a better performance • So it is important to model word groups in source and target language • By using two abstraction levels • Phrase level alignments • Word level alignments • -> greater influence of the context and can be learned explicitly
Syntactic phrases (1) • A collection of all phrase pairs will also include non-intuitive phrases • “Okay, the”, “house the”, etc… • Intuitively such phrases do not help • Restricting to syntactically motivated phrases • The idea of syntactic trees and phrases as subtrees
Syntactic phrases (2) • The input sentence is preprocessed by a syntactic parser • Different operations will be performed on each node • reordering child nodes • inserting extra words at each node • translating leaf words
Syntactic phrases (3) Taken from [4]
Syntactic phrases (4) Taken from [6]
Syntactic phrases (5) • Reordering • Every given child sequence has a probability of reordering (N nodes -> N! pos. reorderings) • The probability of reordering is given by the model (table etc) • Inserting • Extra word can be inserted (left/right) • Another table for insert probability • Translating • Operation is applied to every leaf • Assumption that this operation only depends on the word itself
Experiments • Now we have three models • [1] build a system to compare them and measure performance under different aspects • Weighting syntactic phrases • Maximum phrase length • Setup • Free corpus Europarl • German to English • Performance measured using BLEU score
Comparison of core methods • AP… template alignment • M4 … IBM Model 4 for word based translation • Syn … syntactic phrases • Training corpus size [sentences] Taken from [1] Taken from [1]
Weighting syntactic phrases (1) • The restriction on syntactic phrases is harmful, because too many phrases are eliminated • Intuitively that can not be • Improvements in data collection, during translation, penalizing • Results suggest • Collection of only syntactically phrases • Performance not better • But smaller table sizes
Weighting syntactic phrases (2) • Example: • “es gibt” literally translates in “it gives” but really means “there is” • Not syntactic relationship • Also “with regard to”, “note that” syntactically complex but easy translation
Maximum phrase length • How long do phrases have to be to achieve high performance? • All experiments with “Phrases from word-based alignments” approach Taken from [1] Taken from [1]
Simpler Underlying word-based models (1) • The core of this framework is IBM model 4 for collecting phrase pairs • Model 4 is computationally expensive, parameters problems (approximations) • What about IBM models 1-3 • Faster and easier to implement • Model 1 and 2 compute word alignments efficiently
Simpler Underlying word-based models (2) • How much is performance affected, if the base word alignment on these simpler methods? • M1 worst performance • But M2 & M3 provide similar performance to the M4 model Taken from [1]
Conclusions • Intuitively phrase bases approaches gives better performance than word-based approaches • Also experiments show us that • “straight forward” forward syntax based models have disadvantages • The “best” outcome with small word phrases • Phrase extraction and the alignment heuristic have a great influence
References • [1] Philipp Koehn, Franz Josef Och, Daniel Marcu; Statistical Phrase-Based Translation • [2] Franz Josef Och, Hermann Ney; The Alignment Template Approach to Statistical Machine Translation • [3] Franz Josef Och, Christoph Tillmann, Hermann Ney; Improved Alignment Models for Statistical Machine Translation • [4] Kenji Yamada, Kevin Knight; A Syntax-based Translation Model • [5] Daniel Marcu, William Wong; A Phrase-Based, Joint Probability Model for Statistical Machine Translation • [6] Amitabha Mukerjee, Ankit Soni and Achla M. Raina; Detecting Complex Predicates in Hindi using POS Projection across Parallel Corpora • [7] www.sbox.tugraz.at/home/b/brein/061120_TranslationModelPhraseBased.zip
Statistical Machine Translation Phrase Based Models Advanced Signal Processing 05/06 Reinisch Bernhard