1 / 16

Machine Translation Distortion Model

Machine Translation Distortion Model. Stephan Vogel Spring Semester 2011. Recap: DM in Word Alignment Models. HMM alignment: Jump model Can be conditioned on word classes Balance between data and parameters in model Larger corpora -> richer models. F. 3. 0. -1. 2. E. Distance Model.

osmond
Download Presentation

Machine Translation Distortion Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine TranslationDistortion Model Stephan Vogel Spring Semester 2011 Stephan Vogel - Machine Translation

  2. Recap: DM in Word Alignment Models • HMM alignment: Jump model • Can be conditioned on word classes • Balance between data andparameters in model • Larger corpora -> richer models F 3 0 -1 2 E Stephan Vogel - Machine Translation

  3. Distance Model • Decoder typically generates target sequence sequentially, while jumping forth and back on source sentence • Simplest reordering model • Cost of a reordering depends only on the distance of the reordering • Distribution can be estimated from alignment • Or just a Gaussian with mean 1 • Or log p( aj | aj-1, I) = aj – aj-1 i.e. reordering cost proportional to distance Stephan Vogel - Machine Translation

  4. Lexicalized Reordering Models • Instead of conditioning on classes, condition on actual words • Different possibilities: • Condition on source words vs target words • Condition on words at start of jump (out-bound) vs words at landing point (in-bound) F F E E Stephan Vogel - Machine Translation

  5. Block Distortion Model • Given current block, look at links at the corners • Top: how did I come from previous phrase? • Bottom: how do I continue to next phrase? F Previous Block Left Top Right Top Current Block Current Block Next Block Left Bottom Right Bottom E Stephan Vogel - Machine Translation

  6. Block Distortion Model • Top-Left: prev-to-current = monotone F Previous Block Left Top Current Block E Stephan Vogel - Machine Translation

  7. Block Distortion Model • Top-Right: prev-to-current = swap F Previous Block Right Top Current Block E Stephan Vogel - Machine Translation

  8. Block Distortion Model • Neither top-left nor top-right: prev-to-current = disjoint F Previous Block Current Block E Stephan Vogel - Machine Translation

  9. Block Distortion Model • Bottom-Right: current-to-next = monotone F Current Block Next Block E Stephan Vogel - Machine Translation

  10. Block Distortion Model • Bottom-Left: current-to-next = swap F Current Block Next Block E Stephan Vogel - Machine Translation

  11. Block Distortion Model • Neither bottom-Left nor bottom-right: current-to-next = disjoint F Current Block Next Block E Stephan Vogel - Machine Translation

  12. Moses Code // orientation to previous E bool connectedLeftTop = isAligned( sentence, startF-1, startE-1 ); bool connectedRightTop = isAligned( sentence, endF+1, startE-1 ); if ( connectedLeftTop && !connectedRightTop) extractFileOrientation << "mono"; else if (!connectedLeftTop && connectedRightTop) extractFileOrientation << "swap"; else extractFileOrientation << "other"; // orientation to following E bool connectedLeftBottom = isAligned( sentence, startF-1, endE+1 ); bool connectedRightBottom = isAligned( sentence, endF+1, endE+1 ); if ( connectedLeftBottom && !connectedRightBottom) extractFileOrientation << " swap"; else if (!connectedLeftBottom && connectedRightBottom) extractFileOrientation << " mono"; else extractFileOrientation << " other"; Stephan Vogel - Machine Translation

  13. Block Distortion Model • For each phrase pair 6 counts: 2 groups of 3 • From previous: monotone swap other • To next: monotone swap other • Normalize for each group • We do not model p( orientation | phase_pair_1, phrase_pair_2 ) • Many overlapping and embedded blocks • Would be too sparse • We model p( orientation | phrase_pair, entering )and p( orientation | phrase_pair, leaving ) • I.e. not really looking at the previous block, but only at the alignment link • For each entry in the phrase table we have an entry in the distortion model Stephan Vogel - Machine Translation

  14. Distortion Model Table acuerdo con el lugar de ||| according to the place of ||| 0.14286 0.14286 0.71429 0.71429 0.14286 0.14286 acuerdo con nuestra información ||| according to our information ||| 0.14286 0.14286 0.71429 0.71429 0.14286 0.14286 acuerdo de pesca con Marruecos ||| fisheries agreement with Morocco ||| 0.92982 0.01754 0.05263 0.78947 0.01754 0.19298 acuerdo entre Israel y ||| agreement ||| 0.20000 0.20000 0.60000 0.20000 0.20000 0.60000 acuerdo no porque sea bueno , ||| agreement not because it is good , ||| 0.60000 0.20000 0.20000 0.60000 0.20000 0.20000 acuerdo sobre este punto ||| agreed on ||| 0.20000 0.20000 0.60000 0.20000 0.20000 0.60000 acuerdos a largo plazo se iniciaron en ||| long-term arrangements began in ||| 0.60000 0.20000 0.20000 0.60000 0.20000 0.20000 acuerdos globales , especialmente ||| global agreements - primarily |||0.20000 0.20000 0.60000 0.60000 0.20000 0.20000 • Many entries 0.6 0.2 … • Phrase pair seen only once • Simple smoothing Stephan Vogel - Machine Translation

  15. Distance-based ITG Reordering Model • Simple ITG model had very weak reordering model • Condition it on size of blocks (subtrees) • Condition on distance (e.g. taken from HMM alignment) F E Stephan Vogel - Machine Translation

  16. Summary • Distortion models in word alignment models • Decoders work on phrases -> distortion models or phrases • In Moses: Block reordering (also called lexicalized) • Conditioned on phrase pair • Monotone, swap, disjoint • Alternatives • Based on words at the boundaries • Inbound/Outbound • Easy to have lexicalized distortion model for ITG Stephan Vogel - Machine Translation

More Related