280 likes | 478 Views
Statistical Phrase Alignment Model Using Dependency Relation Probability. Toshiaki Nakazawa and Sadao Kurohashi Kyoto University. Outline. Background Tree-based Statistical Phrase Alignment Model Model Training Experiments Conclusions. Conventional Word Sequence Alignment.
E N D
Statistical Phrase Alignment ModelUsing Dependency Relation Probability Toshiaki Nakazawa and SadaoKurohashi Kyoto University
Outline • Background • Tree-based Statistical Phrase Alignment Model • Model Training • Experiments • Conclusions
Conventional Word Sequence Alignment 受 (accept) 光 (light) A 素子 (device) photogate に (ni) is は (ha) used フォト (photo) for ゲート (gate) the を (wo) photodetector 用いた (used)
Conventional Word Sequence Alignment Proposed Model 受 受 (accept) (accept) 光 A 光 (light) (light) A 素子 photogate 素子 (device) (device) photogate に is に (ni) is (ni) は used は (ha) used (ha) フォト for フォト (photo) for (photo) ゲート the ゲート (gate) (gate) the を photodetector を (wo) photodetector (wo) Dependency trees 用いた 用いた (used) (used)
Proposed Model 受 (accept) 光 A (light) 素子 photogate (device) に is (ni) は used (ha) フォト for (photo) ゲート the (gate) を photodetector (wo) Dependency trees Phrase alignment Bi-directional agreement 用いた (used)
grow-diag-final-and Proposed model
Related Work • Using tree structures • [Cherry and Lin, 2003], [Quirk et al., 2005], [Galley et al., 2006], ITG, … • Considering phrase alignment • [Zhang and Vogel, 2005], [Ion et al., 2006], … • Using two directed models simultaneously • [Liang et al., 2006], [Graca et al., 2008], …
Tree-based Statistical Phrase Alignment Model
Dependency Analysis of Sentences Source (Japanese) Target (English) 受 (accept) 光 A (light) 素子 photogate (device) に is (ni) は used (ha) Word order フォト for (photo) ゲート the (gate) を photodetector (wo) 用いた (used) Head node A photogate is used for the photodetector Head node 受光素子にはフォトゲートを用いた
Overview of the Proposed Model(in comparison to the IBM models) • IBM models find the best alignment by • Proposed model : source sentence : target sentence : alignment : Lexical prob. Word translation Word reordering : Alignment prob. Phrase translation Dependency Relation Phrase translation Dependency Relation Phrase translation Dependency Relation
Phrase Translation Probability IBM Model • Note that the sentences are not previously segmented into phrases F1 F2 E1 f1 s(j): s(1) = 1 s(2) = 2 s(3) = 2 s(4) = 3 s(5) = 1 e1 f2 A: A1=2 A2=3 A3=0 E2 f3 e2 e3 f4 F3 e4 f5 E3 source target
Dependency Relations EAs(p) Inverted parent-child EAs(c) ・・・ Fs(c) ? ・・・ ・・・ fc Parent-child NULL Parent-child fp Fs(p) EAs(p) EAs(c) ・・・ ・・・ target source rel(fc, fp) = p rel(fc, fp) = c rel(fc, fp) = c;c rel(fc, fp) = NULL_p Grandparent-child
Dependency Relation Probability • Ds-pcis a set of parent-child word pairs in the source sentence • Source-side dependency relation probability is defined in the same manner
Model Training p(コロラド|Colorado)=0.7 p(大学|university)=0.6 … • Step 1:Estimate word translation prob. (IBM Model 1) • Initialize dependency relation prob. • Step 2:Estimate phrase translation prob. and dependency relation prob. • E-step • Create initial alignment • Modify the alignment by hill-climbing • Generate possible phrases • M-step: Parameter estimation Word base Tree base p(c) = 0.4 p(c;c)= 0.3 p(p) = 0.2 … p(コロラド|Colorado)=0.7 p(大学|university)=0.6 p(コロラド 大学|university of Colorado)=0.9 …
Step 2 (E-step) Example of Hill-climbing 受 受 受 受 受 • Initial alignment is greedily created • Modify the initial alignment with the operations: • Swap • Reject • Add • Extend 光 光 光 光 光 A A A A A 素子 素子 素子 素子 素子 Initial Alignment photogate photogate photogate photogate photogate に に に に に is is is is is Swap は は は は は Extend Add Reject used used used used used フォト フォト フォト フォト フォト for for for for for ゲート ゲート ゲート ゲート ゲート the the the the the を を を を を photodetector photodetector photodetector photodetector photodetector 用いた 用いた 用いた 用いた 用いた
Generate Possible Phrases 受 • Generate new possible phrases by merging the NULL-aligned nodes into their parent or child non-NULL-aligned nodes • The new possible phrases are taken into consideration from the next iteration 光 A 素子 photogate に is は used フォト for ゲート the を photodetector 用いた
Model Training p(コロラド|colorado)=0.7 p(大学|university)=0.6 … • Step 1:Estimate word translation prob. (IBM Model 1) • Initialize dependency relation prob. • Step 2:Estimate phrase translation prob. and dependency relation prob. • E-step • Create initial alignment • Modify the alignment by hill-climbing • Generate possible phrases • M-step: Parameter estimation Word base Tree base p(c) = 0.4 p(c;c)= 0.3 p(p) = 0.2 … p(コロラド|colorado)=0.7 p(大学|university)=0.6 p(コロラド 大学|university of colorado)=0.9 …
Alignment Experiments • Training: JST Ja-En paper abstract corpus (1M sentences, Ja: 36.4M words, En: 83.6M words) • Test: 475 sentences with the gold-standard alignments annotated by hand • Parsers: KNP for Japanese, MSTParser for English • Evaluation criteria: Precision, Recall, F1 • For the proposed model, we did 5 iterations in each Step
Experimental Results +1.7
Effectiveness of Phrase and Tree • Positional relations instead of dependency relations c -1 p +1
Discussions • Parsing errors • Parsing accuracy is basically good, but still sometimes makes incorrect parsing results • Parsing probability into the model • Search errors • Hill-climbing sometimes goes local minima • Random restart • Function words • Behave quite differently in different languages (ex. case markers in Japanese, articles in English) • Post-processing
Post-processing for Function Words • Reject correspondences between Japanese particles and English “be” or “have” • Reject correspondences of English articles • Japanese “する” and “れる” or English “be” and “have” are merged into its parent verb or adjective if they are NULL-aligned +6.2 +0.3
Conclusion and Future Work • Linguistically motivated phrase alignment • Dependency trees • Phrase alignment • Bi-directional agreement • Significantly better results compared to conventional word alignment models • Future work: • Apply the proposed model for other language pairs (Japanese-Chinese and so on) • Incorporate parsing probability into our model • Investigate the contribution of our alignment results to the translation quality