1 / 13

Competitive Grouping in Integrated Segmentation and Alignment Model

Competitive Grouping in Integrated Segmentation and Alignment Model. Ying Zhang Stephan Vogel Language Technologies Institute School of Computer Science Carnegie Mellon University. Integrated Segmentation and Alignment Model.

starbuck
Download Presentation

Competitive Grouping in Integrated Segmentation and Alignment Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Competitive Grouping in Integrated Segmentation and Alignment Model Ying Zhang Stephan Vogel Language Technologies Institute School of Computer Science Carnegie Mellon University

  2. Integrated Segmentation and Alignment Model • Phrase alignment models (Och et al., 1999; Marcu and Wong, 2002; Kohen et al., 2003) • Many of these models rely on the pre-calculated word alignment. • Use different heuristics to extract phrase pairs from the Viterbi word alignment path. • Integrated Segmentation and Alignment model (Zhang 2003) • No such word alignments needed • Segment source and target sentences into phrases and align them simultaneously • Use chi-square(f, e) instead of the conditional probability P(f|e) for word pair associations • Greedy search for phrase pairs • Key idea: competitive grouping algorithm • Inspired by the competitive linking algorithm (Melamed 1997) for word alignment

  3. Competitive Linking Algorithm • A greedy word alignment algorithm. • The word pair has the highest likelihood L(f,e) “wins” the competition. • One-to-one assumption: when pair{f, e} is “linked”, neither f nor e can be aligned with any other words. • Example:

  4. Competitive Grouping Algorithm • Discard the one-to-one assumption in competitive linking, make it less greedy. • When a pair {e, f} wins the competition, inviting the neighboring pairs to join the “winner’s club”. • Introducing the locality assumption: a source phrase of adjacent words can only be aligned to a target phrase of adjacent words. • Words inside the aligned phrase pairs can not be aligned to other words

  5. Expanding the Phrase Pair Aligned • Two criteria have to be satisfied to expand the seeding word pair to phrase pairs • If a new source word f is to be grouped, the best e that f is associated should not be “blocked” by this expansion; similar for grouping a new target word. • The highest word pair likelihood value in the expanded area needs to be “similar” to the seed value • According to the locality assumption, words in the aligned phrase pairs can not be aligned with other words again.

  6. Exploring All Possible Phrase Pairs • Criterion 2 is used to control the granularity of the phrase pairs aligned • Two short phrase pairs • Or one long phrase pairs • Short phrases give better coverage for unseen testing data • Long phrases encapsulate more context, e.g. local reordering, word sense, and etc. • Hard to decided on the optimal granularity without knowing the testing data • Solution: for each grouping, try all possible granularities

  7. Exploring All Possible Phrase Pairs French: Je déclare reprise la session English: I declare resumed the session

  8. The Likelihood of Word Associations • Chi-square statistics is used to measure the likelihood of word associations for pair {e, f} • For each word pair {e, f} null hypothesis: e and f are independent of each other. • Calculating to measure how true is this hypothesis • Construct the contingency table using the counts from the corpus given the current alignment, e.g. uniform alignment • O11: number of times when e and f are aligned • O12: number of times when e aligned with other f • O21: number of times when f aligned with other e • O22: number of times when other f aligned with other e

  9. In WPT-05 • Submitted results for all four languages • Training data as provided • Language model as provided • Decoder (Pharaoh) as provided

  10. Conclusion • Competitive grouping algorithm at the core of the ISA model • Simple and efficient model • Comparable results as other phrase alignment models

  11. The Evolution of ISA

  12. Matrix of the Likelihood

  13. Expanding the Phrase Pairs

More Related