270 likes | 279 Views
Explore the Extraction Set Models for Machine Translation, which optimize an extraction-based loss function for generating translations, perform better than baselines, and involve tree-to-string translation.
E N D
Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan Klein UC Berkeley Presenter Justin Chiu
Contribution • Extraction set • Nested collections of all the overlapping phrase pairs consistent with an underlying word-alignment • Advantages over word-factored alignment model • Can incorporate features on phrase pairs, more than word link • Optimize a extraction-based loss function really direct to generating translation • Perform better than both supervised and unsupervised baseline
Progress of Statistical MT • Generate translated sentences word by word • Using while fragments of training example, building translation rules • Aligned at the word level • Extract fragment-level rules from word aligned sentence pair • Tree to string translation • Extraction Set Models • Set of all overlapping phrasal translation rule + alignment
Outline • Extraction Set Models • Model Estimation • Model Inference • Experiments
Extraction Set Models • Input • Unaligned sentence • Output • Extraction set of phrasal translation rules • Word alignment
Possible and Null Alignment Links • Possible links has two types • Function words that is unique in its language • Short phrase that has no lexical equivalent • Null alignment • Express content that isabsent in its translation
Five systems for comparison • Unsupervised baseline • Giza++ • Joint HMM • Supervised baseline • Block ITG • Extraction Set Coarse Pass • Does not score bispans that corss bracketing of ITG derivations • Full Extraction Set Model
Data • Discriminative training and alignment evaluation • Trained baseline HMM on 11.3 million words of FBIS newswire data • Hand-aligned portion of the NIST MT02 test set • 150 training and 191 test sentences • End-to-end translation experiments • Trained on 22.1 million word prarllel corpus consisting of sentence up to 40 of newswire data from GALE program • NIST MT04/MT05 test sets
Discussion • Syntax labels v.s words • Word align to rule Rule to word align • Information from two directions • 65% of type 1 error