Semi-Supervised Boosting for Statistical Word Alignment

Semi-Supervised Boosting for Statistical Word Alignment Wu Hua 2006/10/18

Outline • Introduction to semi-supervised learning • Introduction to boosting • Semi-supervised boosting for word alignment • Evaluation results • Conclusion

Machine Learning Methods • Supervised Learning • Labeled data • Unsupervised learning • Unlabeled data • Semi-supervised learning • Combine both labeled data and unlabeled data

Semi-Supervised Learning in NLP • Word sense disambiguation • (Yarowsky, 1995; Pham et al., 2005) • Classification • (Blum and Mitchell, 1998; Thorsten, 1999) • Clustering • (Basu et al., 2004) • Named entity classification • (Collins and Singer, 1999) • Parsing • (Sarkar, 2001)

Reference Set End? Boosting – Supervised Learning Initialization Supervised Learning Call Learner Calculate Error Rate Re-weight Training data Yes Build Ensemble

Boosting in NLP • Tagging and PP attachment • (Abney et al., 1999) • Word sense disambiguation • (Escudero et al., 2000) • Parser construction • (Haruno et al., 1999; Henderson and Brill, 2000) • Sentence generation • (Walker et al., 2001)

Semi-Supervised Boosting • Three main problems • Semi-supervised learner • Combine labeled data and unlabeled data • Reference set • Automatically construct a reference set for unlabeled data • Error rate calculation • How to calculate the error rate with both labeled data and unlabeled data

End? Semi-Supervised Boosting Applied to Word Alignment Labeled Data Unlabeled Data Supervised Training Unsupervised Training Model Interpolation Real Reference Set Error Rate Calculation Pseudo Reference Set Re-weight Training data Yes Build Ensemble

Semi-Supervised Boosting Applied to Word Alignment • Five main components • Word alignment model interpolation • Pseudo reference set construction for unlabeled data • Error rate calculation • Weight update • Final Ensemble

Word Alignment Model • Supervised alignment model • Calculate the probabilities for IBM Model 4 based on the labeled data • Unsupervised alignment model • Use GIZA++ to train IBM Model 4 • Perform model interpolation

Pseudo Reference Set Construction • Obtain bi-directional word alignment sets S1 and S2 on the training data • Obtain the intersection set of these two alignment sets • Filter the union set of the two alignment sets • Build the pseudo reference set where

Error Rate Calculation • For a sentence pair • Calculate the error rate of a aligner • Based on the labeled data instead of the whole data where is the normalized weight of the ith sentence pair at the lth round

Re-Weight the Training Data • Reweight each sentence pair in the training set • For each sentence pair, there may exist correct links and incorrect links as compared with the pseudo reference set • Calculate the weight of each sentence pair according to the correct and incorrect links where K is the number of the error links n is the total number of the links in the reference

Final Ensemble • Obtain the final ensemble according to the trained word aligners on each round where is the final ensemble for word alignment is the weight of each alignment pair (s,t) produced by the word aligner is the weight of the word aligner

Evaluation • Training set • Unlabeled data: 320,000 English-Chinese pairs • Labeled data: 30,000 English-Chinese pairs • Held-out set • 1,500 sentence pairs • Testing set • 1,000 bilingual English-Chinese sentence pairs • Totally 8,651 alignment links

Evaluation Metric • Word alignment • Precision and Recall • Alignment Error Rate (AER) • Phrase-based machine translation • System: Pharaoh • Metrics: NIST and BLEU

Word Alignment Results

Method Precision Recall AER Baseline 0.7946 0.7775 0.2140 Our method 0.8175 0.7858 0.1987 Weights in Ensembles • Two kinds of weights • Weights for the individual aligners • Weights for the individual alignment links Baseline: only use the first kind of weights Our method: use the two kinds of weights

Translation Results

Conclusion • Features in our semi-supervised boosting method • Perform model interpolation • Automatically build pseudo reference set • Calculate the error rate of training set with the labeled data • Use two kinds of weights in the ensemble • One for aligners • The other for alignment links • Boosting improves the word alignment and translation quality • Boosting does improve word alignment and translation quality • Semi-supervised boosting performs the best

Thanks!

Semi-Supervised Boosting for Statistical Word Alignment

Semi-Supervised Boosting for Statistical Word Alignment

Presentation Transcript

Semi-supervised Learning

Semi-Supervised Learning

Semi-Supervised Learning

Supervised, semi-supervised and Unsu pervised approaches for word sense disambiguation

Semi-supervised learning

Semi-supervised learning

Semi-Supervised Learning

Supervised and semi-supervised learning for NLP

Semi-Supervised Clustering

Statistical Machine Translation Word Alignment

Semi-supervised Training of Statistical Parsers

(Statistical) Approaches to Word Alignment

Bayesian Word Alignment for Statistical Machine Translation

Semi-Supervised Learning

Semi-Supervised Learning

Semi-supervised Learning

Semi-Supervised Learning

SemiBoost : Boosting for Semi-supervised Learning

Semi-Supervised Clustering

(Statistical) Approaches to Word Alignment

Semi-Supervised Learning