1 / 35

Getting the structure right for word alignment: LEAF

Getting the structure right for word alignment: LEAF. Alexander Fraser and Daniel Marcu Presenter Qin Gao. Quick summary. Problem. Result. IBM Models have 1-N assumption. Significant Improvement on BLEU (AR-EN). Solutions. A sophisticated generative story.

Download Presentation

Getting the structure right for word alignment: LEAF

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Getting the structure right for word alignment: LEAF Alexander Fraser and Daniel Marcu Presenter Qin Gao

  2. Quick summary Problem Result IBM Models have 1-N assumption Significant Improvement on BLEU (AR-EN) Solutions A sophisticated generative story Generative Estimation of parameters Additional Solution Decompose the model components Semi-supervised training

  3. The generative story

  4. Minimal translational correspondence

  5. The generative story A B C

  6. 1a. Condition: Source word A B C

  7. 1b. Determine source word class A B C

  8. 2a. Condition on source classes C(A) C(B) C(C)

  9. 2b. Determine links between head word and non-head words C(A) C(B) C(C)

  10. 3a. Depends on the source head word A B C

  11. 3b. Determine the target head word A B C X

  12. 4a. Conditioned on source head word and cept size A B C 2 X

  13. 4b. Determine the target cept size A B C 2 X ?

  14. 5a. Depend on the existing sentence length A B C X ? 2

  15. 5b. Determine the number of spurious target words A B C X ? ? 2

  16. 6a. Depend on the target word A B C X ? ? X Y Z

  17. 6b. Determine the spurious word A B C X ? Z X Y Z

  18. 7a. Depends on target head word’s class and source word A B C C(X) ? Z

  19. 7b. Determine the non-head word it linked to A B C C(X) Y Z

  20. 8a. Depends on the classes of source/target head words C(A) B C C(X) Y Z 1 2 3

  21. 8b. Determine the position of target head word C(A) B C Y Z 1 2 3 C(X)

  22. 8c. Depends on the target word class C(A) B C Y Z 1 2 3 C(X)

  23. 8d. Determine the position of non-headwords C(A) B C Z 1 2 3 C(X) Y

  24. 9. Fill the vacant position uniformly C(A) B C 1 2 3 Z C(X) Y

  25. (10) The real alignment C(A) B C 1 2 3 Z C(X) Y

  26. Unsupervised parameter estimation • Bootstrap using HMM alignments in two directions • Using the intersection to determine head words • Using 1-N alignment to determine target cepts • Using M-1 alignment to determine source cepts • Could be infeasible

  27. Training: Similar to model 3/4/5 • From some alignment (not sure how they get it), apply one of the seven operators to get new alignments • Move French non-head word to new head, • move English non-head word to new head, • swap heads of two French non-head words, • swap heads of two English non-head words, • swap English head word links of two French head words, • link English word to French word making new head words, • unlink English and French head words. • All the alignments that can be generated by one of the operators above, are called neighbors of the alignment

  28. Training If we have better alignment in the neighborhood, update the current alignment Continue until no better alignment can be found Collect count from the last neighborhood

  29. Semi-supervised training Decompose the components in the large formula treat them as features in log-linear model And other features Used EMD algorithm (EM-Discriminative) method

  30. Experiment First, a very weird operation, they fully link alignments from ALL systems and then compare the performance

  31. Training/Test Set

  32. Experiments French/English: Phrase based Arabic/English: Hierarchical (Chiang 2005) Baseline: GIZA++ Model 4, Union Baseline Discriminative: Only using Model 4 components as features

  33. Conclusion(Mine) The new structural features are useful in discriminative training No evidence to support the generative model is superior over model 4

  34. Unclear points Are F scores “biased?” No BLEU score given for LEAF unsupervised They used features in addition to LEAF features, where is the contribution comes from?

More Related