A Novel Phrase-based System Combination Framework for MT

A Novel Phrase-based System Combination Framework for MT Necip Fazil Ayan1, Bing Xiang2, Bonnie J. Dorr1, Richard Schwartz2, Spyros Matsoukas2, Antti-Veikko I. Rosti2 1University of Maryland, College Park 2BBN Technologies, Inc.

Motivation and Goal • Motivation: • Different systems have their strengths/weaknesses • Can we create better hypotheses choosing the best phrases from different systems? • Goal: • Combine outputs of various machine translation (MT) systems by examining the source-to-target phrases used by individual MT systems • Choose the “best” phrases from each system and merge them in a way that will yield the closest translations to reference translations

Phrase-based Combination Framework N-best lists S1 S2 SN Phrase Collection Confidence Estimation Decoding Output Optimization

Phrase Collection • Reduce the search space for the decoding for combination • Collect source-to-target phrases for each sentence • Same source interval • Same target surface string • How to Collect Them? • Can be provided by the systems • Can be generated automatically using an automated word-alignment tool • Phrases can overlap (and subsume each other) • Hierarchical phrases must be flattened out

Enriching Initial Phrase Sets • Goal: Increase the number of common phrases among systems • Different systems might use different phrases to translate the same source interval to the same target string • Original phrase set for each system is extended by • Phrase Concatenation: Combine adjacent phrases (both at the source and target level) • Phrase Splitting: Generate one-word phrases by estimating word correspondences using word translation probabilities

Phrase Confidence Estimation • Weighted combination of phrase posteriors • Posteriors of the hypotheses that contain this phrase • System weights • Similarity of phrases to other phrases • 4 types of similarity levels • Same source interval, same target words, and same original distortion • Same source interval, same target words but different original distortion • Overlapping source intervals with the same target words • Overlapping target words • Each phrase in one hypothesis is similar to another hypothesis at only one similarity level

Example: Phrase Posteriors • 2 systems, 2 hypotheses for each system, 3 phrases • p1 ~ p2 at similarity level 2, p2 ~ p3 at similarity level 3 Sim 1 Sim 2 Sim 3 Sys 1 Posteriors related to p1 Sys 2 Sys 1 Posteriors related to p2 Sys 2 Sys 1 Posteriors related to p3 Sys 2

Example: Combining Phrase Posteriors For One System SimW1 = 0.7 SimW2 = 0.2 SimW3 = 0.1 Combined Sys 1 Posteriors related to p1 Sys 2 Sys 1 Posteriors related to p2 Sys 2 Sys 1 Posteriors related to p3 Sys 2

Combining Phrase Posteriors Among Different Systems • sysWi: Confidence weight for system i • Post(m, i): Posterior of phrase m in system i • Conf(m): Overall confidence of phrase m

Decoding • Phrasal decoding based on standard beam search (Koehn, 2004) • Features • Language Model • Phrase penalty • Word penalty • Distortion penalty • Original distortion penalty (computed over the set of phrases generated by each system) • Combined phrase confidence • The total score for a hypothesis is a log-linear combination of these features

System Optimization • Optimization by BBN’s generic optimizer • Based on Powell’s method • Operates on n-best lists with various feature scores • Can optimize for an arbitrary scoring function • Number of weights to be optimized: N+M+6 • N: Number of systems • M: Number of similarity levels • One feature weight for each of the first 5 features • (N-1) + M + (3-1) = N+M+1 weights for phrase confidence • System weights sum up to 1 • Interpolation weights sum up to 1

Experimental Settings (1) • 6 input systems • Three phrase-based • Two hierarchical phrase-based • One syntax-based • Arabic-to-English training data: ~140M words • Chinese-to-English training data: ~225M words • All systems optimized on NIST MTEval'02 test set • 2 systems were tuned to minimize TER (Snover et al., 2006) • 4 systems were tuned to maximize BLEU (Papineni et al., 2002)

Experimental Settings (2) • Input: 100-best lists generated by each system with phrase-to-phrase alignments • Combination weights optimized on NIST MTEval'03 test sets • Optimization Functions: • TER for Arabic-English • BLEU for Chinese-English • Tested on NIST MTEval'04 and NIST MTEval'05 test sets • Evaluated using mixed case TER and BLEU scores

Results (Arabic-English)

Results (Chinese-English)

Conclusions • Presented a novel phrase-based system combination framework • Build a new translation option table for each sentence separately and re-decode for a consensus translation • Arabic-English: Up to 1.7 BLEU point improvement over the best system • Chinese-English: The improvements on tuning sets are not reflected in the results on the test sets. Why? • The success of combination method depends on setting several weights appropriately • Hard to optimize (too many weights!) • Distortion features override the effects of other features

Proposal for System Combination Evaluation • A lot of effort on system combination within GALE • Hard to evaluate different system combination approaches against each other • Different input systems • Different development and evaluation sets • Software not publicly available • Proposal: Create a common corpus to evaluate system combination methods • A standard set of system outputs on standard MTEval test sets • A common format that satisfies the requirements of each approach • Periodic updates on system outputs

A Novel Phrase-based System Combination Framework for MT

A Novel Phrase-based System Combination Framework for MT

Presentation Transcript

A REVISED FRAMEWORK FOR APPRAISING NOVEL MOLECULAR CLASSIFIERS

Stat-XFER: A General Framework for Search-based Syntax-driven MT

Direct MT, Example-based MT, Statistical MT

Stat-XFER: A General Framework for Search-based Syntax-driven MT

SILO: A novel framework for flexible protocol composition

Stat-XFER: A General Framework for Search-based Syntax-driven MT

Stat-XFER: A General Framework for Search-based Syntax-driven MT

Bagging-based System Combination for Domain Adaptation

Novel System Architectures for Semantic Based Sensor Networks Integration

A Phrase Mining Framework for Recursive Construction of a Topical Hierarchy

A Framework for Ontology-Based Knowledge Management System Jiangning Wu

A NOVEL SOCIAL CLUSTER-BASED P2P FRAMEWORK FOR INTEGRATING VANETS WITH THE INTERNET

Novel combination of existing ideas

A Syntax-Driven Bracketing Model for Phrase-Based Translation

A Novel Surgical System

QoS Provisioning Framework for OSD-Based Storage System

A Novel Lexicalized HMM-based Learning Framework for Web Opinion Mining

2010 Failures in Czech-English Phrase-Based MT

Stat-XFER: A General Framework for Search-based Syntax-driven MT

System Combination

A NOVEL SOCIAL CLUSTER-BASED P2P FRAMEWORK FOR INTEGRATING VANETS WITH THE INTERNET

A Novel Lexicalized HMM-based Learning Framework for Web Opinion Mining