Bagging-based System Combination for Domain Adaptation

Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing Technology Chinese Academy of Sciences

An Example

An Example • Initial MT system

An Example • Initial MT system • Tuned MT system that fits domain A Development set A:90% B:10% The translation styles of A and B are quite different

An Example • Initial MT system • Tuned MT system that fits domain A Test set A:10% B:90% Development set A:90% B:10%

An Example The translation style fits A, but we mainly want to translate B • Initial MT system • Tuned MT system that fits domain A Test set A:10% B:90% Development set A:90% B:10%

Traditional Methods Monolingual data with domain annotation

Traditional Methods Monolingual data with domain annotation Domain recognizer

Traditional Methods Bilingual training data

Traditional Methods training data : domain A Bilingual training data training data : domain B Domain recognizer

Traditional Methods training data : domain A MT system domain A Bilingual training data training data : domain B MT system domain B Domain recognizer

Traditional Methods Test set

Traditional Methods Test set domain A Test set Test set domain B Domain recognizer

Traditional Methods Test set domain A The translation result domain A MT system domain A The translation result Test set domain B The translation result domain B MT system domain B

The merits • Simple and effective • Fits Human’s intuition

The drawbacks • Classification Error (CE) • Especially for unsupervised methods • Supervised methods can make CE low, yet requiring annotation data limits its usage

Our motivation • Jump out of the alley of doing adaptation directly • Statistics methods (such as Bagging) can help.

Preliminary The general framework of Bagging

General framework of Bagging Training set D

General framework of Bagging Training set D Training set D1 Training set D2 Training set D3 …… C1 C2 C3 ……

General framework of Bagging C1 C2 C3 …… Test sample

General framework of Bagging Voting result Result of C1 Result of C2 Result of C3 …… C1 C2 C3 …… Test sample

Our method

Training Suppose there is a development set A,A,A,B,B For simplicity, there are only 5 sentences, 3 belong A, 2 belong B

Training We bootstrap N new development sets A,B,B,B,B A,A,B,B,B A,A,A,B,B A,A,B,B,B A,A,A,B,B A,A,A,A,B ……

Training For each set, a subsystem is tuned MT system-1 A,B,B,B,B A,A,B,B,B MT system-2 A,A,A,B,B A,A,B,B,B MT system-3 A,A,A,B,B MT system-4 A,A,A,A,B MT system-5 …… ……

Decoding For simplicity,Suppose only 2 subsystem has been tuned Subsystem-1 W:<-0.8,0.2> Subsystem-1 W:<-0.6,0.4>

Decoding Subsystem-1 W:<-0.8,0.2> A B Subsystem-1 W:<-0.6,0.4> Now a sentence “A B” needs a translation

Decoding After translation, each system generate its N-best candidate a b; <0.2, 0.2> a c; <0.2, 0.3> Subsystem-1 W:<-0.8,0.2> A B a b; <0.2, 0.2> a b; <0.1, 0.3> a d; <0.3, 0.4> Subsystem-1 W:<-0.6,0.4>

Decoding Fuse these N-best listsand eliminate deductions a b; <0.2, 0.2> a c; <0.2, 0.3> Subsystem-1 W:<-0.8,0.2> a b; <0.1, 0.2> a b; <0.1, 0.3> a c; <0.2, 0.3> a d; <0.3, 0.4> A B a b; <0.2, 0.2> a b; <0.1, 0.3> a d; <0.3, 0.4> Subsystem-1 W:<-0.6,0.4>

Decoding a b; <0.2, 0.2> a c; <0.2, 0.3> Subsystem-1 W:<-0.8,0.2> a b; <0.1, 0.2> a b; <0.1, 0.3> a c; <0.2, 0.3> a d; <0.3, 0.4> A B a b; <0.2, 0.2> a b; <0.1, 0.3> a d; <0.3, 0.4> Subsystem-1 W:<-0.6,0.4> Candidates are identical only if their target strings and feature values are entirely equal

S represent the number of subsystems Decoding Subsystem-1 W:<-0.8,0.2> a b; <0.2, 0.2> a b; <0.1, 0.3> a c; <0.2, 0.3> a d; <0.3, 0.4> a b; <0.2, 0.2>; -0.16 a b; <0.1, 0.3>; +0.04 a c; <0.2, 0.3>; -0.1 a d; <0.3, 0.4>; -0.18 Subsystem-1 W:<-0.6,0.4> Calculate the voting score

Decoding Subsystem-1 W:<-0.8,0.2> a b; <0.2, 0.2> a b; <0.1, 0.3> a c; <0.2, 0.3> a d; <0.3, 0.4> a b; <0.2, 0.2>; -0.16 a b; <0.1, 0.3>; +0.04 a c; <0.2, 0.3>; -0.1 a d; <0.3, 0.4>; -0.18 Subsystem-1 W:<-0.6,0.4> The one with the highest score wins

Decoding Subsystem-1 W:<-0.8,0.2> a b; <0.2, 0.2> a b; <0.1, 0.3> a c; <0.2, 0.3> a d; <0.3, 0.4> a b; <0.2, 0.2>; -0.16 a b; <0.1, 0.3>; +0.04 a c; <0.2, 0.3>; -0.1 a d; <0.3, 0.4>; -0.18 Subsystem-1 W:<-0.6,0.4> The one with the highest score wins • Since subsystems are different copies of the same model and share unique training data, calibration is unnecessary

Experiments

Basic Setups • Data: NTCIR9 Chinese-English patent corpus • 1k sentence pairs as development set • Another 1k pairs as test set • The remains are used for training • System: hierarchical phrase based model • Alignment: GIZA++ grow-diag-final

Effectiveness : Show and Prove • Tune 30 subsystems using Bagging • Tune 30 subsystems with random initial weight • Evaluate the fusion results of the first N (N=5,10, 15, 20, 30) subsystems of both and compare

Results: 1-best +0.82 Number of subsystem

Results: 1-best +0.70 Number of subsystem

Results: Oracle +6.22 Number of subsystem

Results: Oracle +3.71 Number of subsystem

Compare with traditional methods • Evaluate a supervised method • For tackling data sparsity only operate on development set and test set • Evaluate a unsupervised method • Similar to Yamada (2007) • To avoid data sparsity, only LM specific

Results

Conclusions • Propose a bagging-based method to address multi-domain translation problem. • Experiments shows that: • Bagging is effective for domain adaptation problem • Our method surpass baseline explicitly, and is even better than some traditional methods.

Thank you for listening And any questions?

Bagging-based System Combination for Domain Adaptation

Bagging-based System Combination for Domain Adaptation

Presentation Transcript

Community Based Adaptation GUATEMALA

Bagging

A Robust Bagging Method using Median as a Combination Rule

Domain Adaptation

Domain Adaptation in Natural Language Processing

Domain Adaptation with Structural Correspondence Learning

Machine Translation Domain Adaptation

Bagging

Domain Adaptation with Multiple Sources

Bagging

A Robust Bagging Method using Median as a Combination Rule

Learning Representations of Language for Domain Adaptation

Domain Adaptation for Biomedical Information Extraction

A causal model based subject domain creation for a Web-based education system

Similarity-based Classifier Combination for Decision Making

A Novel Phrase-based System Combination Framework for MT

Swarm-based adaptation

Frustratingly Easy Domain Adaptation

Domain Adaptation for Statistical Machine Translation

Domain Adaptation with Multiple Sources

System Combination

Bagging Machine