1 / 17

A Novel Phrase-based System Combination Framework for MT

A Novel Phrase-based System Combination Framework for MT. Necip Fazil Ayan 1 , Bing Xiang 2 , Bonnie J. Dorr 1 , Richard Schwartz 2 , Spyros Matsoukas 2 , Antti-Veikko I. Rosti 2 1 University of Maryland, College Park 2 BBN Technologies, Inc. Motivation and Goal. Motivation:

alden-wynn
Download Presentation

A Novel Phrase-based System Combination Framework for MT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Novel Phrase-based System Combination Framework for MT Necip Fazil Ayan1, Bing Xiang2, Bonnie J. Dorr1, Richard Schwartz2, Spyros Matsoukas2, Antti-Veikko I. Rosti2 1University of Maryland, College Park 2BBN Technologies, Inc.

  2. Motivation and Goal • Motivation: • Different systems have their strengths/weaknesses • Can we create better hypotheses choosing the best phrases from different systems? • Goal: • Combine outputs of various machine translation (MT) systems by examining the source-to-target phrases used by individual MT systems • Choose the “best” phrases from each system and merge them in a way that will yield the closest translations to reference translations

  3. Phrase-based Combination Framework N-best lists S1 S2 SN Phrase Collection Confidence Estimation Decoding Output Optimization

  4. Phrase Collection • Reduce the search space for the decoding for combination • Collect source-to-target phrases for each sentence • Same source interval • Same target surface string • How to Collect Them? • Can be provided by the systems • Can be generated automatically using an automated word-alignment tool • Phrases can overlap (and subsume each other) • Hierarchical phrases must be flattened out

  5. Enriching Initial Phrase Sets • Goal: Increase the number of common phrases among systems • Different systems might use different phrases to translate the same source interval to the same target string • Original phrase set for each system is extended by • Phrase Concatenation: Combine adjacent phrases (both at the source and target level) • Phrase Splitting: Generate one-word phrases by estimating word correspondences using word translation probabilities

  6. Phrase Confidence Estimation • Weighted combination of phrase posteriors • Posteriors of the hypotheses that contain this phrase • System weights • Similarity of phrases to other phrases • 4 types of similarity levels • Same source interval, same target words, and same original distortion • Same source interval, same target words but different original distortion • Overlapping source intervals with the same target words • Overlapping target words • Each phrase in one hypothesis is similar to another hypothesis at only one similarity level

  7. Example: Phrase Posteriors • 2 systems, 2 hypotheses for each system, 3 phrases • p1 ~ p2 at similarity level 2, p2 ~ p3 at similarity level 3 Sim 1 Sim 2 Sim 3 Sys 1 Posteriors related to p1 Sys 2 Sys 1 Posteriors related to p2 Sys 2 Sys 1 Posteriors related to p3 Sys 2

  8. Example: Combining Phrase Posteriors For One System SimW1 = 0.7 SimW2 = 0.2 SimW3 = 0.1 Combined Sys 1 Posteriors related to p1 Sys 2 Sys 1 Posteriors related to p2 Sys 2 Sys 1 Posteriors related to p3 Sys 2

  9. Combining Phrase Posteriors Among Different Systems • sysWi: Confidence weight for system i • Post(m, i): Posterior of phrase m in system i • Conf(m): Overall confidence of phrase m

  10. Decoding • Phrasal decoding based on standard beam search (Koehn, 2004) • Features • Language Model • Phrase penalty • Word penalty • Distortion penalty • Original distortion penalty (computed over the set of phrases generated by each system) • Combined phrase confidence • The total score for a hypothesis is a log-linear combination of these features

  11. System Optimization • Optimization by BBN’s generic optimizer • Based on Powell’s method • Operates on n-best lists with various feature scores • Can optimize for an arbitrary scoring function • Number of weights to be optimized: N+M+6 • N: Number of systems • M: Number of similarity levels • One feature weight for each of the first 5 features • (N-1) + M + (3-1) = N+M+1 weights for phrase confidence • System weights sum up to 1 • Interpolation weights sum up to 1

  12. Experimental Settings (1) • 6 input systems • Three phrase-based • Two hierarchical phrase-based • One syntax-based • Arabic-to-English training data: ~140M words • Chinese-to-English training data: ~225M words • All systems optimized on NIST MTEval'02 test set • 2 systems were tuned to minimize TER (Snover et al., 2006) • 4 systems were tuned to maximize BLEU (Papineni et al., 2002)

  13. Experimental Settings (2) • Input: 100-best lists generated by each system with phrase-to-phrase alignments • Combination weights optimized on NIST MTEval'03 test sets • Optimization Functions: • TER for Arabic-English • BLEU for Chinese-English • Tested on NIST MTEval'04 and NIST MTEval'05 test sets • Evaluated using mixed case TER and BLEU scores

  14. Results (Arabic-English)

  15. Results (Chinese-English)

  16. Conclusions • Presented a novel phrase-based system combination framework • Build a new translation option table for each sentence separately and re-decode for a consensus translation • Arabic-English: Up to 1.7 BLEU point improvement over the best system • Chinese-English: The improvements on tuning sets are not reflected in the results on the test sets. Why? • The success of combination method depends on setting several weights appropriately • Hard to optimize (too many weights!) • Distortion features override the effects of other features

  17. Proposal for System Combination Evaluation • A lot of effort on system combination within GALE • Hard to evaluate different system combination approaches against each other • Different input systems • Different development and evaluation sets • Software not publicly available • Proposal: Create a common corpus to evaluate system combination methods • A standard set of system outputs on standard MTEval test sets • A common format that satisfies the requirements of each approach • Periodic updates on system outputs

More Related