230 likes | 482 Views
Word Sense Disambiguation for Machine Translation. Han-Bin Chen 2010.11.24. Reference Paper. Cabezas and Resnik . 2005. Using WSD Techniques for Lexical Selection . (Technical report) Carpuat and Wu. 2005. Word Sense Disambiguation vs. Statistical Machine Translation . (ACL 2005)
E N D
Word Sense Disambiguation for Machine Translation Han-Bin Chen 2010.11.24
Reference Paper • Cabezas and Resnik. 2005. Using WSD Techniques for Lexical Selection. (Technical report) • Carpuat and Wu. 2005. Word Sense Disambiguation vs. Statistical Machine Translation. (ACL 2005) • Carpuat and Wu. 2005. Improving Statistical Machine Translation using Word Sense Disambiguation. (EMNLP 2007) • Chan et al. 2007. Word Sense Disambiguation Improves Statistical Machine Translation. (ACL 2007) • Apidianaki. 2009. Data-driven semantic analysis for multilingual WSD. (EACL 2009)
SMT Workflow Bilingual Corpus Monolingual Corpus Translation model Reordering model Language model Decoder Input: source language Output: target language
MT Research Areas Bilingual Corpus Monolingual Corpus Word Alignment Translation model Reordering model Language model Decoder Input: source language Output: target language Evaluation Metric
Translation Model (TM) • Research in TM • Phrase extraction • Phrase filtering • Phrase augmentation • Word Sense Disambiguation (WSD)
Traditional WSD • Target word is a single content word • Noun, verb, adjectives • Classification task with predefined senses • WordNet, HowNet • Modern WSD system • Not limited to local context • Linguistic information • Position-sensitive • Syntactic • Collocation • A intuitive application of WSD is SMT
WSD in MT • Wrong translations from Google Translate • what is today's special ? • 什 麼 是 今 天 的 特 色? • I would like to reserve a table for three • 我想保留一表三 • the plane will briefly stop over in the airport • 這架飛機將簡要地停留在機場
WSD in MT: Early Stage • Whether WSD model can help SMT • Energetically debated question over the past years • Implicit WSD in SMT • Local context: phrase table & language model • Dedicated WSD system • Wider variety of context features • Position, sentence-level, document-level features • WSD should play a role in MT • Publicly available SMT system • Pharaoh by Philipp Koehn (2003~2004)
Small Scale Experiment (1) • Marine CARPUAT and Dekai Wu, 2005 • Chinese-to-English translation task • Chinese lexical sample task includes 20 target • Trained with state-of-the-art WSD • 37 training instances per target word (manual annotation)
Small Scale Experiment (2) • Hard decision • Force the decoder to choose translations from glosses • Decided by language model • Surprising and frustrating result • Small data, out-of-domain material, hard decision • Language model effect
Translation Disambiguation (1) • Clara Cabezas and Philip Resnik, 2005 • Address 3 problems of the previous work • Use aligned target word directly as "sense" • 4 senses for "briefly": {短暫地, 短時間地, 簡潔地, 簡要地} • Trained with state-of-the-art WSD • Handle "small data" and "out-of-domain" problems • Soft decision • Pharoah XML markup • Choose specified translations and translation model together • Handle "hard decision" problem
Translation Disambiguation (2) • Pharaoh XML markup • Experiment & Result • Spanish-to-English test from Europarl test • WSD: 0.2382, Baseline: 0.2356 • Not statistically significant • But at least it is not a decrease
Toward Better Integration into SMT • How to better integrate WSD into SMT? • Phrase-based sense disambiguation (PSD) • Key points • Phrase, not word • Integration into log-linear model: weight tuning
Successful Integration (1) • Chan et al., 2007 • Chinese-to-English translation • Sense disambiguation on Chinese phrase • 1 or 2 consecutive Chinese words • Extract training examples from word-aligned corpus • Add WSD features • Contextual probability of WSD • Reward probability of WSD
Successful Integration (2) • Statistically significant improvement • 將 無法 取得 更 多 援助 或 其他 讓步 • Hiero: will be more aid and other concessions • Hiero+WSD: will be unable to obtain more aid and other concessions
PSD System (1) • Marine CARPUAT and Dekai Wu, 2007 • WSD model for every phrase • Extract training data from phrase extraction • WSD probability as new feature • Comments • Not every phrase need WSD • Technical problem (Pharaoh)
PSD System (2) • Result: better translation on all test sets IWSLT 2006 dataset NIST 2004 test set
Recent Issue • Different translations may have the same sense • 2 senses for "briefly", rather than 4 • Sense 1: {短暫地, 短時間地} • Sense 2: {簡潔地, 簡要地} • Automatic sense clustering
Sense Clustering (1) • Marianna Apidianaki, 2009 • Two translations are semantically related • If they occur in similar context • Translation unit (TU) as context • Bilingual sentence pair • Source word "briefly" • Translations • {短暫地, 短時間地, 簡潔地, 簡要地} • {t1, t2, t3, t4}
Sense Clustering (2) • "briefly-t1" occurs in context {TU1, TU4, TU25, TU88…} • "briefly-t2" occurs in context {TU5, TU18, TU92, TU126…} • Clustering based on pairwise context similarity • Apidianaki, 2008
Sense Clustering (3) • Experiment • English-Greek translation • 150 ambiguous English nouns • Evaluation of lexical selection • Strict precision (Exact match with answer word) • Enriched precision (Match with the cluster of answer word) • Result
Conclusion • From WSD to PSD • However, semantic is also important • Future work • Semantic PSD