1 / 37

Linguistically-motivated Tree-based Probabilistic Phrase Alignment

Explore the innovative Tree-Based Probabilistic Phrase Alignment Model by Nakazawa and Kurohashi of Kyoto University, overcoming limitations of word-based alignment for diverse language pairs. Learn about the generation and features of this model, its model training process using EM algorithm, and symmetrization algorithm. Discover how this model improves translation quality compared to traditional methods.

laurencej
Download Presentation

Linguistically-motivated Tree-based Probabilistic Phrase Alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linguistically-motivated Tree-based Probabilistic Phrase Alignment Toshiaki Nakazawa, Sadao Kurohashi (Kyoto University)

  2. Outline • Background • Tree-based Probabilistic Phrase Alignment Model • Model Training • Symmetrization Algorithm • Experiments • Conclusions

  3. Background • Many of state-of-the-art SMT systems are based on “word-based” alignment results • Phrase-based SMT [Koehn et al., 2003] • Hierarchical Phrase-based SMT [Chiang, 2005] • and so on • Some of them incorporate syntactic information “after” word-based alignment • [Quirk et al., 2005], [Galley et al., 2006] and so on • Is it enough? • Is it able to achieve “practical” translation quality?

  4. Background (cont.) • Word-based alignment model works well for structurally similar language pairs • It is not effective for language pairs with great difference in linguistic structure such as Japanese and English • SOV versus SVO • For such language pair, syntactic information is necessary even during alignment process

  5. Related Work • Syntactic tree-based model • [Yamada and Knight, 2001], [Gildea, 2003], ITG by Wu • Incorporating some operations which control sub-trees (re-order, insert, delete, clone) to reproduce the opposite tree structure • Our model does not require any operations • Our model utilizes dependency trees • Dependency tree-based model • [Cherry and Lin, 2003] • Word-to-word, and one-to-one alignment • Our model makes phrase-to-phrase alignment, and can make many-to-many links

  6. Features of Proposed Tree-based Probabilistic Phrase Alignment Model • Generation model similar to IBM models • Using phrase dependency structures • “phrase” means a linguistic phrase (cf. phrase-based SMT) • Phrase to phrase alignment model • Each phrase (node) consists of basically 1 content word and 0 or more function words • Source side content words can be aligned to content words of target side only (same for function words) • Generation starts from the root node and end up with one of leaf nodes (cf. IBM model is from first word to last word)

  7. Outline • Background • Tree-based Probabilistic Phrase Alignment Model • Model Training • Symmetrization Algorithm • Experiments • Conclusions

  8. Dependency Analysis of Sentences Source Target プロピレングリコールは血中グルコースインスリンを上昇させ、血中NEFA濃度を減少させる Propylene glycol increases in blood glucose and insulin and decreases in NEFA concentration in the blood Word order Root node Root node Head node Head node

  9. IBM Model v.s Tree-based Model : source sentence • IBM Model [Brown et al., 93] • Tree-based Model : target sentence : alignment : parameters : source tree : target tree

  10. Model Decomposition:Lexicon Probability • Suppose consists of nodes and consists of nodes • is calculated as a product of two probabilities Ex) 濃度 を- in concentration 上昇 させ- increase Phrase translation probability

  11. Model Decomposition:Alignment Probability • Define the parent node of as • is decomposed as a product of target side dependency relation probability conditioned on source side relation • If the parent node has been aligned to NULL, indicates the grandparent of , and this continues until has been aligned to other than NULL • models a tree-based reordering Dependency relation probability

  12. Outline • Background • Tree-based Probabilistic Phrase Alignment Model • Model Training • Symmetrization Algorithm • Experiments • Conclusions

  13. Model Training • The proposed model is trained by EM algorithm • First, phrase translation probability is learned (Model 1) • Model 1 can be efficiently learned without approximation (cf. IBM model 1 and 2) • Next, dependency relation probability is learned (Model 2) with probabilities learned in Model 1 as initial parameters • Model 2 needs some approximation (cf. IBM model 3 or greater), we use beam-search algorithm

  14. Model 1 • Each phrase in source side can correspond to an arbitrary phrase in target side a or NULL phrase • A probability of one possible alignment is: • Then, tree translation probability is: • Efficiently calculated as:

  15. Model 2 (imaginary ROOT node) • Root node of a sentence is supposed to depend on the imaginary ROOT node, which works as a Start-Of-Sentence (SOS) in word-based model • The ROOT node in source tree always corresponds to that of target tree necessary 事例 を 通して the point 援助 の through the case 視点 に in the viewpoint 必要な of the assist ポイント を was confirmed 確認 した ROOT ROOT

  16. Model 2 (beam-search algorithm) • It is impossible to enumerate all the possible alignment • Consider only a subset of “good-looking” alignments using beam-search algorithm • Ex) beam-width = 4 necessary 事例 を 通して the point through the case 援助 の 視点 に in the viewpoint 必要な of the assist was confirmed ポイント を 確認 した NULL

  17. Model 2 (beam-search algorithm) necessary necessary necessary necessary 事例 を 通して 事例 を 通して 事例 を 通して 事例 を 通して the point the point the point the point 援助 の 援助 の 援助 の 援助 の through the case through the case through the case through the case 視点 に 視点 に 視点 に 視点 に in the viewpoint in the viewpoint in the viewpoint in the viewpoint 必要な 必要な 必要な 必要な of the assist of the assist of the assist of the assist ポイント を ポイント を ポイント を ポイント を was confirmed was confirmed was confirmed was confirmed 確認 した 確認 した 確認 した 確認 した NULL NULL NULL NULL

  18. Model 2 (parameter notations) • Dependency relation between two phrases and is defined as a path from to using the following notations: • “c-” if is a pre-child of • “c+” if is a post-child of • “p-” if is a post-child of • “p+” if is a pre-child of • “INCL” if and are same phrase • “ROOT” if is an imaginary ROOT node • “NULL” if is aligned to NULL c- c+ p- p+ ROOT ROOT

  19. Model 2 (parameter notations, cont.) • In a case where and are two or more nodes distant from each other, the relation is described by combining the notations Ex) p- c- c- c+ c+ c-;c+ p-;c+;c-

  20. Dependency Relation Probability Examples necessary necessary necessary necessary 事例 を 通して 事例 を 通して 事例 を 通して 事例 を 通して the point the point the point the point 援助 の 援助 の 援助 の 援助 の through the case through the case through the case through the case 視点 に 視点 に 視点 に 視点 に in the viewpoint in the viewpoint in the viewpoint in the viewpoint 必要な 必要な 必要な 必要な of the assist of the assist of the assist of the assist ポイント を ポイント を ポイント を ポイント を was confirmed was confirmed was confirmed was confirmed 確認 した 確認 した 確認 した 確認 した NULL NULL NULL NULL

  21. Example necessary 事例 を 通して the point 援助 の through the case 視点 に in the viewpoint 必要な of the assist ポイント を was confirmed 確認 した ROOT ROOT

  22. Outline • Background • Tree-based Probabilistic Phrase Alignment Model • Model Training • Symmetrization Algorithm • Experiments • Conclusions

  23. Symmetrization Algorithm • Since our model is directed, we run the model bi-directionally and symmetrize two alignment results heuristically • Symmetrization algorithm is similar to [Koehn et al. 2003], which uses 1-best GIZA++ word alignment result of each direction • Our algorithm exploits n-best alignment results of each direction • Three steps: • Superimposition • Growing • Handling isolations

  24. Symmetrization Algorithm1. Superimposition Source to Target 5-best Target to Source 5-best ・・・ ・・・

  25. Symmetrization Algorithm1. Superimposition (cont.) • Definitive alignment points are adopted • The points which don’t have same or higher scored point in their same row or column • Conflicting points are discarded • The points which is in the same row or column of the adopted point and is not contiguous to the adopted point on tree

  26. Symmetrization Algorithm2. Growing • Adopt contiguous points to adopted points in both source and target tree • In descending order of the score • From top to bottom • From left to right • Discard conflicting points • The points which have adopted point both in the same row and column

  27. Symmetrization Algorithm3. Handling Isolation • Adopt points which are not aligned to any phrase in both source and target language

  28. Alignment Experiment • Training corpus • Japanese-English paper abstract corpus provided by JST which consists of about 1M parallel sentences • Gold-standard alignment • Manually annotated 100 sentence pairs among the training corpus • Sure (S) alignment only [Och and Ney, 2003] • Evaluation unit • Morpheme-based for Japanese • Word-based for English • Iterations • 5 iterations for Model 1, and 5 iterations for Model 2

  29. Alignment Experiment (cont.) • Comparative experiment (word-base alignment) • GIZA++ and various symmetrization heuristics [Koehn et al., 2007] • Default settings for GIZA++ • Use original forms of words for both Japanese and English

  30. Results

  31. Example of Alignment Improvement Proposed model Word-base alignment

  32. Translation Experiments • Training corpus • Same to alignment experiments • Test corpus • 500 paper abstract sentences • Decoder • Moses [Koehn et al., 2007] • Use default options except for phrase table limit (20 -> 10) and distortion limit (6 -> -1) • No minimum error rate training • Evaluation • BLEU • No punctuations and case-insensitive

  33. Results • Definition of function words is improper • Articles? Auxiliary verbs? … • Tree-based decoder is necessary • BLEU is essentially insensitive to syntactic structure • Translation quality potentially improved

  34. Potentially Improved Example • Input: これ は LB 膜 の 厚み が アビジン を 吸着 する こと で 増加 した こと に よる 。 • Proposed (30.13): this is due to the increase in the thickness of the lb film avidin adsorb • GIZA++ (33.78): the thickness of the lb film avidin to adsorption increased by it • Reference: this was due to increased thickness of the lb film by adsorbing avidin

  35. Conclusion • Tree-based probabilistic phrase alignment model using dependency tree structures • Phrase translation probability • Dependency relation probability • N-best symmetrization algorithm • Achieve high alignment accuracy compared to word-based models • Syntactic information is useful during alignment process • BUT: Unable to improve the BLEU scores of translation

  36. Future Work • More flexible model • Content words sometimes correspond to function words and vice versa • Integrate parsing probabilities into the model • Parsing errors easily lead to alignment errors • By integrating parsing probabilities, parsing results and alignment can be revised complementary • More syntactical information • Use POS or phrase category into the model

  37. Thank You!

More Related