300 likes | 586 Views
Statistical Machine Translation Word Alignment. Stephan Vogel MT Class Spring Semester 2011. Overview. Word alignment – some observations Models IBM2 and IBM1: 0 th -order position model HMM alignment model: 1 st -order position model IBM3: fertility IBM4: plus relative distortion.
E N D
Statistical Machine TranslationWord Alignment Stephan Vogel MT Class Spring Semester 2011 Stephan Vogel - Machine Translation
Overview • Word alignment – some observations • Models IBM2 and IBM1: 0th-order position model • HMM alignment model: 1st-order position model • IBM3: fertility • IBM4: plus relative distortion Stephan Vogel - Machine Translation
Alignment Example Observations: • Mostly 1-1 • Some 1-to-many • Some 1-to-nothing • Often monotone • Not always clear-cut • English ‘eight’ is a time • German has ‘acht Uhr’ • Could also leave ‘Uhr’ unaligned Stephan Vogel - Machine Translation
Evaluating Alignment • Given some manually aligned data (ref) and automatically aligned data (hyp) links can be • Correct, i.e. link in hyp matches link in ref: true positive (tp) • Wrong, i.e. link in hyp but not in ref: false positive (fp) • Missing, i.e. link in ref but not in hyp: false negative (fn) • Evaluation measures • Precision: P = tp / (tp + fp) = correct / links_in_hyp • Recall: R = tp / (tp + fn) = correct / links_in_ref • Alignment Error Rate: AER = 1 – F = 1 – 2tp / (2tp +fp +fn) Stephan Vogel - Machine Translation
Sure and Possible Links • Sometimes it is difficult for human annotators to decide • Differentiate between sure and possible links • En: Det Noun - Ch: Noun, don’t align Det, or align to NULL? • En: Det Noun - Ar: DetNoun, should Det be aligned to DetNoun? • Alignment Error Rate with sure and possible links (Och 2000) • A = generated links • S = sure links (no finding a sure link is an error) • P = possible links (putting a link which is not possible is an error) Stephan Vogel - Machine Translation
Word Alignment Models • IBM1 – lexical probabilities only • IBM2 – lexicon plus absolut position • IBM3 – plus fertilities • IBM4 – inverted relative position alignment • IBM5 – non-deficient version of model 4 • HMM – lexicon plus relative position • BiBr – Bilingual Bracketing, lexical probabilites plus reordering via parallel segmentation • Syntactical alignment models [Brown et.al. 1993, Vogel et.al. 1996, Och et al 2000, Wu 1997, Yamada et al. 2003, and many others] Stephan Vogel - Machine Translation
GIZA++ Alignment Toolkit • All standard alignment models (IBM1 … IBM5, HMM) are implemented in GIZA++ • This toolkit was started (as GIZA) at John Hopkins University workshop 1998 • Extended and improved by Franz Josef Och • Now used by many groups • Known problems: • Memory when training on large corpora • Writes many large files (depends on your parameter setting) • Extensions for large corpora (Qin Gao) • Distributed GIZA: run on many machines, I/O bound • Multithreaded GIZA: run on one machine, multiple cores Stephan Vogel - Machine Translation
Notation • Source language • f: source (French) word • J: length of source sentence • j: position in source sentence (target position) • : source sentence • Target language • e: target (English) word • I: length of target sentence • i: position in target sentence (source position) • : target sentence • Alignment: relation mapping source to target positions • i=aj: position i of ei which is aligned to j • : whole alignment Stephan Vogel - Machine Translation
SMT - Principle • Translate a ‘French’ stringinto an ‘English’ string • Bayes’ decision rule for translation: • Why this inversion of the translation direction? • Decomposition of dependencies: makes modeling easier • Cooperation of two knowledge sources for final decision • Note: IBM paper and GIZA call e source and f target Stephan Vogel - Machine Translation
Alignment as Hidden Variable • ‘Hidden alignments’ to capture word-to-word correspondences • Mapping A subset of [1, …, J]x[1, …, I] • Number of connections: J * I (each source word with each target word • Number of alignments: 2JI (each connection yes/no) • Summation over all alignments • To many alignments, summation not feasible Stephan Vogel - Machine Translation
Restricted Alignment • Each source word has one connection • Alignment mapping becomes function: j -> i = aj • Number of alignments is now: IJ • Sum over all alignments: • Not possible to enumerate • In some situations full summationpossible through Dynamic Programming • In other situations: take only best alignmentand perhaps some alignments closeto the best one Stephan Vogel - Machine Translation
Empty Position (Null Word) • Sometimes a word has no correspondence • Alignment function aligns each source word to one target word, i.e. cannot skip source word • Solution: • Introduce empty position 0 with null word e0 • ‘Skip’ source word fj by aligning it to e0 • Target sentence is extended to: • Alignment is extended to: Stephan Vogel - Machine Translation
Translation Model • Sum over all alignment • 3 probability distributions: • Length: • Alignment: • Lexicon: Stephan Vogel - Machine Translation
Model Assumptions Decompose interaction into pairwise dependencies • Length: Source length only dependent on target length (very weak) • Alignment: • Zero order model: target position only dependent on source position • First order model: target position only dependent on previous target position • Lexicon: source word only dependent on aligned word Stephan Vogel - Machine Translation
Mixture Model • Interpretation as mixture model by direct decomposition • Again, simplifying model assumptions applied Stephan Vogel - Machine Translation
Training IBM2 • Expectation-Maximization (EM) Algorithm • Define posterior weight (i.e. sum over column = 1) • Lexicon probabilities • Alignment probabilities count how often word pairs are aligned Turn counts into probabilities Stephan Vogel - Machine Translation
IBM1 Model • Assume uniform probability for position alignment • Alignment probability • In training: only collect counts for word pairs Stephan Vogel - Machine Translation
Training for IBM1 Model – Pseudo Code # Accumulation (over corpus) For each sentence pair For each source position j Sum = 0.0 For each target position i Sum += p(fj|ei) For each target position i Count(fj,ei) += p(fj|ei)/Sum # Re-estimate probabilities (over count table) For each target word e Sum = 0.0 For each source word f Sum += Count(f,e) For each source word f p(f|e) = Count(f,e)/Sum # Repeat for several iterations Stephan Vogel - Machine Translation
HMM Alignment Model • Idea: relative position model Entire word groups (phrases) are moved with respect to source position Target Source Stephan Vogel - Machine Translation
HMM Alignment • First order model: target position dependent on previous target position(captures movement of entire phrases) • Alignment probability: • Maximum approximation: Stephan Vogel - Machine Translation
Viterbi Training on HMM Model # Accumulation (over corpus) # find Viterbi path For each sentence pair For each source position j For each target position i Pbest = 0; t = p(fj|ei) For each target position i’ Pprev = P(j-1,i’) a = p(i|i’,I,J) Pnew = Pprev*t*a if (Pnew > Pbest) Pbest = Pnew BackPointer(j,i) = i’ # update counts i = argmax{ BackPointer( J, I ) } For each j from J downto 1 Count(f_j, e_i)++ Count(i,iprev,I,J)++ i = BackPoint(j,i) # renormalize … Pnew=Pprev*a*t t = p(fj | ei) a = p(i | i’,I,J) Pprev Stephan Vogel - Machine Translation
i j HMM Forward-Backward Training • Gamma : Probability to emit fj when in state i in sentence s • Sum over all paths through (j,i) Stephan Vogel - Machine Translation
HMM Forward-Backward Training • Epsilon: Probability to transit from state i’ into i • Sum over all paths through (j-1,i’) and (j,i), emitting fj i j-1 j Stephan Vogel - Machine Translation
Forward Probabilities • Defined as: • Recursion: • Initial condition: i Stephan Vogel - Machine Translation j
Backward Probabilities • Defined as: • Recursion: • Initial condition: i Stephan Vogel - Machine Translation j
Forward-Backward • Calculate Gamma and Epsilon with Alpha and Beta: • Gammas: • Epsilons: Stephan Vogel - Machine Translation
Parameter Re-Estimation • Lexicon probabilities • Alignment probabilities: Stephan Vogel - Machine Translation
Forward-Backward Training – Pseudo Code # Accumulation For each sentence-pair { Forward. (Calculate Alpha’s) Backward. (Calculate Beta’s) Calculate Xi’s and Gamma’s. For each source word { Increase LexiconCount(f_j|e_i) by Gamma(j,i). Increase AlignCount(i|i’) by Epsilon(j,i,i’). } } # Update Normalize LexiconCount to get P(f_j|e_i). Normalize AlignCount to get P(i|i’). Stephan Vogel - Machine Translation
Example HMM Training Stephan Vogel - Machine Translation