200 likes | 233 Views
Oasis A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models MAHCINE INTELLIGENCE AND TRANSLATION LAB HARBIN INSTITUTE OF TECHNOLOGY. Oasis. A user manual for Oasis The files needed by Oasis Some explanations for the parameters A summary for the parameters
E N D
OasisA Beam Search Decoder for Phrase-Based Statistical Machine Translation Models MAHCINE INTELLIGENCE AND TRANSLATION LAB HARBIN INSTITUTE OF TECHNOLOGY
Oasis • A user manual for Oasis • The files needed by Oasis • Some explanations for the parameters • A summary for the parameters • The details for the decoding process • Oasis for Phrase-based SMT • Core algorithm
A User Manual for Oasis • The Designers • Li Jun, Liang Huashen, Jiang Hongfei, Sun Jiadong • CPU: P4 2.0GHz or higher • Ram: 512M or larger • Windows —— Visual C++ 6.0 or Linux —— gcc 3.4.2
The Files Needed by Oasis • Phrase-Table File • Language-Model File --------ARPA • The Configuration File for Oasis (The User Manual for Oasis)
Some Explanations for The Parameters(1) • [ttable-limit] ---top n Translation, Language Model, English Phrase Length • [stack] & [threshold] -s:maximum size of the beam (default 100) -b:minimum threshold of the beam (default 0.00001)
Some Explanations for The Parameters(2) • [distortion] d = abs(last word position of previously translated phrase + 1- first word position of newly translated phrase) -dl:Maximum distance between two input phrase that are translated to two neighboring output phrase -d:Minimum of the distortion score
Some explanations for the parameters(3) • [lm-limit] -m:minimum score of the language model of the phrase • [nbest] -l: generate n-best lattice
A summary for the parameters • -f Specify the Configuration file • -in Specify the input data • -out Specify the output data • -s Specify the maximum size of the beam, 100 • -b Specify the beam threshold, default 0.00001 • -l Specify the N-best output, default 1 • -d Specify the min distortion score,-2.30259 • -dl Specify the distortion distance, -9 • -m Specify
The details for the decoding process • Translation options • Future cost • Hypothesis element
An example for the details • creating hypothesis 1 from 0 • base score 0 • translation cost -1.28215 • distortion cost 0 • language model cost for 'same' -2.57302 • language model cost for 'the' -1.91582 • word penalty 2 • score -3.77099 + futureCost -15.8278 = -19.5988 • new best estimate for this stack • merged hypothesis on stack 1, now size 1
Oasis for Phrase-based SMT • Translation Option [ 同样 ] • the same, -1.28215, -4.05183 [ 的 ] in, -1.8011, -2.94581 right, -2.71656, -4.88024 of a, -2.82411, -4.96468 flight, -2.62181, -4.90627 's, -2.5144, -4.40283 of the, -2.21918, -3.63043 [ 东西 ] anything, -2.03706, -4.64327 came, -2.2911, -5.35805
Future cost (1) • future costs from 0 to 0 is -4.05183 • future costs from 0 to 1 is -4.95449 • …… • future costs from 0 to 7 is -22.5568 • future costs from 0 to 8 is -19.8797 • future costs from 1 to 1 is -2.29265 • future costs from 1 to 2 is -4.18843 • …… • future costs from 6 to 7 is -5.82631 • future costs from 6 to 8 is -1.82436 • future costs from 7 to 7 is -3.53366 • future costs from 7 to 8 is -0.85651 • future costs from 8 to 8 is -0.68513
Future cost (2) • The calculation of future cost translation option cost language model cost
Core algorithm • The generation of a phrase table Future cost • The hypothesis and state expansion • Beam search • The generation of the English sentence
The structure of the hypothesis data • The present cost • The translation cost for the new translation phrase • The distortion cost • The penalty for the new translation option • Language cost • The future cost • The new phrase positions in the foreign sentence • The new phrase • The last two english words generated • ID
Recombining hypotheses • The foreign words covered so far • The last two English words generated • The last word of the last foreign phrase covered
Beam search • The fixed and relative threshold • The stack