Advanced Signal Processing 05/06 Reinisch Bernhard

Statistical Machine Translation Phrase Based Model Advanced Signal Processing 05/06 Reinisch Bernhard

Overview • The quality of the MT systems have improved with the use of phrase translation • Phrases from word-based alignments • Syntactic phrases • Phrases from phrase alignments • IBM word-based statistical MT systems enhanced with phrase translation • Best to extract phrase translations pairs? • Evaluation Framework / Outcome

Word based approaches • Try to model word-to-word correspondences • Models are often restricted • source word -> exactly one target word • Hidden Markov models in speech recognition • Enhanced to “One-to-many” alignment model • Solve lexical problems like • “Zahnarzttermin” -> “dentist’s appointment” • Order of words will be changed

Statistical machine translation (1) • argmax … search/decoding problem (generation of the output sentence) • Pr(e1) … language model • Pr(f1|e1) … translation model

Statistical machine translation (2) Taken from [2]

Learning translation lexica • Following describes methods for learning single-word and phrase-based translation lexica • Statistical alignment models • Used for learning word alignments • Symmetrization • Bilingual phrases • Alignment templates

Statistical alignment models (1) • In the alignment model • A “hidden” parameter is introduced a • a describes the mapping from source position j to target position aj • “a” is represented as a matrix with binary values • 1 entry … words are aligned • 0 entry … words are not aligned • source word -> no target word (empty word eo)

Statistical alignment models (2) • In general the model depends on a set of unknown parameters • Exist several different specific statistical alignment models • First compute word alignments i.e. model 4 • Train this hidden parameters θ • Alignment with highest probability • called Viterbi alignment

Symmetrization (1) • Baseline alignment model (i.e. model 4) does not allow multiple target words • “Zahnarzttermin” -> “dentist’s appointment” • Outcome should be such alignment matrix Taken from [2]

Symmetrization (2) • To solve this problem • Training in both directions • For a sentence pair -> two Viterbi alignments • Now both alignments tables A1 and A2 have to combined (symmetized) • Simple union of both tables (some refined methods) • Result then is used to train single word based translation lexica

Symmetrization (2) • By computing for relative frequencies using: • N(e|f) … how many times e and f are aligned • N(f) … how many time the word f occurs

Bilingual phrases • Now we need an algorithm that relationships between whole phrases of source sentence m and target sentence n • “phrase extract” algorithm and take as input alignment matrix A Taken from [2]

Alignment templates (1) • A more systematic approach • Considers whole phrases • Whole group of adjacent words in the source • maps to a whole group of words in the target • The context of words have greater influence • The changes of word order can be learned • The Idea is to model two different alignment levels • Word level alignments • Phrase level alignments

Alignment templates (2) • Alignments templates z • “F”… source class sequence • “E”…target class sequence • “A”… describes the alignment between source and target • “F” and “E” are classes • The advantage is a better generalization

Alignment templates (3) Taken from [2]

Alignment templates (4) • For the training we need the probability of applying an alignment template • The “phrase extraction” have to be modified • Can be estimated by relative frequencies • Finished the “Learning translation lexica”-task

Translation model (1) • For notation we decompose the sentences • f1J…source sentence • e1I…target sentence • sequence of phrases (k=1,…,K) • Further considerations (only one segmentation)

Translation model (2) • The model have to allow reordering of the phrases

Translation model (3) Taken from [2]

Translation model (4) Taken from [2]

Alignment template approach results • Evaluation of the approach by a translation task (“Verbmobil Task”) • Additional preprocessing • word-joinings • word-splitting Taken from [2] Taken from [2]

Alignment template approach conclusions • Overall we see a better performance • So it is important to model word groups in source and target language • By using two abstraction levels • Phrase level alignments • Word level alignments • -> greater influence of the context and can be learned explicitly

Syntactic phrases (1) • A collection of all phrase pairs will also include non-intuitive phrases • “Okay, the”, “house the”, etc… • Intuitively such phrases do not help • Restricting to syntactically motivated phrases • The idea of syntactic trees and phrases as subtrees

Syntactic phrases (2) • The input sentence is preprocessed by a syntactic parser • Different operations will be performed on each node • reordering child nodes • inserting extra words at each node • translating leaf words

Syntactic phrases (3) Taken from [4]

Syntactic phrases (4) Taken from [6]

Syntactic phrases (5) • Reordering • Every given child sequence has a probability of reordering (N nodes -> N! pos. reorderings) • The probability of reordering is given by the model (table etc) • Inserting • Extra word can be inserted (left/right) • Another table for insert probability • Translating • Operation is applied to every leaf • Assumption that this operation only depends on the word itself

Experiments • Now we have three models • [1] build a system to compare them and measure performance under different aspects • Weighting syntactic phrases • Maximum phrase length • Setup • Free corpus Europarl • German to English • Performance measured using BLEU score

Comparison of core methods • AP… template alignment • M4 … IBM Model 4 for word based translation • Syn … syntactic phrases • Training corpus size [sentences] Taken from [1] Taken from [1]

Weighting syntactic phrases (1) • The restriction on syntactic phrases is harmful, because too many phrases are eliminated • Intuitively that can not be • Improvements in data collection, during translation, penalizing • Results suggest • Collection of only syntactically phrases • Performance not better • But smaller table sizes

Weighting syntactic phrases (2) • Example: • “es gibt” literally translates in “it gives” but really means “there is” • Not syntactic relationship • Also “with regard to”, “note that” syntactically complex but easy translation

Maximum phrase length • How long do phrases have to be to achieve high performance? • All experiments with “Phrases from word-based alignments” approach Taken from [1] Taken from [1]

Simpler Underlying word-based models (1) • The core of this framework is IBM model 4 for collecting phrase pairs • Model 4 is computationally expensive, parameters problems (approximations) • What about IBM models 1-3 • Faster and easier to implement • Model 1 and 2 compute word alignments efficiently

Simpler Underlying word-based models (2) • How much is performance affected, if the base word alignment on these simpler methods? • M1 worst performance • But M2 & M3 provide similar performance to the M4 model Taken from [1]

Conclusions • Intuitively phrase bases approaches gives better performance than word-based approaches • Also experiments show us that • “straight forward” forward syntax based models have disadvantages • The “best” outcome with small word phrases • Phrase extraction and the alignment heuristic have a great influence

References • [1] Philipp Koehn, Franz Josef Och, Daniel Marcu; Statistical Phrase-Based Translation • [2] Franz Josef Och, Hermann Ney; The Alignment Template Approach to Statistical Machine Translation • [3] Franz Josef Och, Christoph Tillmann, Hermann Ney; Improved Alignment Models for Statistical Machine Translation • [4] Kenji Yamada, Kevin Knight; A Syntax-based Translation Model • [5] Daniel Marcu, William Wong; A Phrase-Based, Joint Probability Model for Statistical Machine Translation • [6] Amitabha Mukerjee, Ankit Soni and Achla M. Raina; Detecting Complex Predicates in Hindi using POS Projection across Parallel Corpora • [7] www.sbox.tugraz.at/home/b/brein/061120_TranslationModelPhraseBased.zip

Statistical Machine Translation Phrase Based Models Advanced Signal Processing 05/06 Reinisch Bernhard

Advanced Signal Processing 05/06 Reinisch Bernhard

Advanced Signal Processing 05/06 Reinisch Bernhard

Presentation Transcript

Dataflow Modeling of Signal Processing and Communication Systems

The effect of advanced signal processing strategies in hearing aids on user performance and preference

DIGITAL SIGNAL PROCESSING

Sound Processing

ME 392 Chapter 5 Signal Processing February 20, 2012 week 7 part 1

Signal Processing

Barcelona Forum on Ph.D. Research in Communications, Electronics and Signal Processing

Digital Signal Processing Techniques

Advanced Digital Signal Processing

Advanced Digital Signal Processing

ELEN E4810: Digital Signal Processing Week 1: Introduction

Signal Processing Front End

Advanced signal processing Dr. Mohamad KAHLIL Islamic University of Lebanon

Digital Signal Processing

Advanced signal processing Dr. Mohamad KAHLIL Islamic University of Lebanon

Signal Processing: EEG to ERP

Advanced Topics in Signal Processing for Wireless Communications

Digital Signal Processing II `Advanced Topics’