150 likes | 352 Views
Bayesian Word Alignment for Statistical Machine Translation. Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group. Paper info. Bayesian Word Alignment for Statistical Machine Translation ACL 2011 Short Paper With Source Code in Perl on 379 lines
E N D
Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group
Paper info • Bayesian Word Alignment for Statistical Machine Translation • ACL 2011 Short Paper • With Source Code in Perl on 379 lines • Authors • Coskun Mermer • Murat Saraclar
Core Idea • Propose a Gibbs Sampler for Fully Bayesian Inference in IBM Model 1 • Result • Outperform classical EM in BLEU up to 2.99 • Effectively address the rare word problem • Much smaller phrase table than EM
Mathematics • (E, F): parallel corpus • ei , fj : i-th (j-th) source (target) word in e (f), which contains I (J) words in corpus E (F). • e0 : Each E sentence contains “null” word • VE(VF): size of source (target) vocabulary • a (A): alignment for sentence (corpus) • aj : fj has alignment aj for source word eaj • T: parameter table, size is VEx VF • te,f = P(f|e): word translation probability
IBM Model 1 T as a random variable
Dirichlet Distribution • T={te,f} is an exponential family distribution • Specifically being multinomial distribution • We choose the conjugate prior • In the case of Dirichlet Distribution for computational convenience
Dirichlet Distribution Each source word type te is a distribution over the target vocabulary, to be a Dirichlet distribution Avoid rare words acting as “garbage collectors”
Dirichlet Distribution samplethe unknowns A and T in turn ¬j denotes the exclusion of the current value of aj .
Algorithm A can be arbitrary, but normal EM output is better
Code View bayesalign.pl
Conclusions • Outperform classical EM in BLEU up to 2.99 • Effectively address the rare word problem • Much smaller phrase table than EM • Shortcomings • Too slow: 100 sentence pairs costs 18 mins • Maybe can be speedup by parallel computing