640 likes | 808 Views
Forest-to-String Statistical Translation Rules. Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese Academy of Sciences. Outline. Introduction Forest-to-String Translation Rules Training Decoding Experiments Conclusion.
E N D
Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese Academy of Sciences
Outline • Introduction • Forest-to-String Translation Rules • Training • Decoding • Experiments • Conclusion
Syntactic and Non-syntactic Bilingual Phrases NP NP VP NR NN VV NN syntactic non-syntactic BUSH PRESIDENT MADE SPEECH President Bush made a speech
Importance of Non-syntactic Bilingual Phrases • About 28% of bilingual phrases are non-syntactic on a English-Chinese corpus (Marcu et al., 2006). • Requiring bilingual phrases to be syntactically motivated will lose a good amount of valuable knowledge (Koehn et al., 2003). • Keeping the strengths of phrases while incorporating syntax into statistical translation results in significant improvements (Chiang, 2005) .
Previous Work Galley et al., 2004 NP NP VP NR NN VV NN BUSH PRESIDENT MADE SPEECH President Bush made a speech
Previous Work Marcu et al., 2006 *NPB_*NN DT JJ NPB the mutual DT JJ NN THE MUTUAL the mutual understanding NPB THE MUTUAL UNDERSTANDING *NPB_*NN NN
Previous Work Liu et al., 2006 NP NP VP NP NR NN VV NN NR NN BUSH PRESIDENT MADE SPEECH BUSH PRESIDENT President Bush President Bush
Our Work • We augment the tree-to-string translation model with • forest-to-string rules that capture non-syntactic phrase pairs • auxiliary rules that help integrate forest-to-string rules into the tree-to-string model
Outline • Introduction • Forest-to-String Translation Rules • Training • Decoding • Experiments • Conclusion
Tree-to-String Rules VP IP SB VP NN WAS NP VV NP VP PU GUNMAN NN KILLED the gunman was killed by
IP NP VP PU NN SB VP . GUNMAN WAS NP VV NN KILLED POLICE killed the gunman was by police . Derivation • A derivation is a left-most composition of translation rules that explains how a source parse tree, a target sentence, and the word alignment between them are synchronously generated.
A Derivation Composed of TRs IP IP NP VP PU NP VP PU
A Derivation Composed of TRs IP NP VP PU NP NN NN GUNMAN GUNMAN the gunman the gunman
A Derivation Composed of TRs IP VP NP VP PU SB VP NN SB VP WAS NP VV GUNMAN WAS NP VV NN KILLED NN KILLED was killed by killed the gunman was by
A Derivation Composed of TRs IP NP VP PU NN NN SB VP POLICE GUNMAN WAS NP VV NN KILLED POLICE police killed the gunman was by police
IP NP VP PU NN SB VP . GUNMAN WAS NP VV NN KILLED POLICE killed the gunman was by police . A Derivation Composed of TRs PU . .
Forest-to-String and Auxiliary Rules IP NP NP VP PU NN SB SB VP GUNMAN WAS the gunman was forest = tree sequence ! care about only root sequence while incorporating forest rules
A Derivation Composed of TRs, FRs, and ARs IP IP NP VP PU NP VP PU SB VP SB VP
A Derivation Composed of TRs, FRs, and ARs IP NP NP VP PU NN SB NN SB VP GUNMAN WAS GUNMAN WAS the gunman was the gunman was
A Derivation Composed of TRs, FRs, and ARs IP VP NP VP PU NP VV PU NN SB VP . . KILLED GUNMAN WAS NP VV KILLED . killed by killed the gunman was by .
A Derivation Composed of TRs, FRs, and ARs IP NP NP VP PU NN SB VP . NN GUNMAN WAS NP VV POLICE NN KILLED POLICE police killed the gunman was by police .
Outline • Introduction • Forest-to-String Translation Rules • Training • Decoding • Experiments • Conclusion
Training • Extract both tree-to-string and forest-to-string rules from word-aligned, source-side parsed bilingual corpus • Bottom-up strategy • Auxiliary rules are NOT learnt from real-world data
NP NP VP NR NN VV NN BUSH PRESIDENT MADE SPEECH President Bush made a speech An Example NR BUSH Bush
NP NP VP NR NN VV NN BUSH PRESIDENT MADE SPEECH President Bush made a speech An Example NN PRESIDENT President
NP NP VP NR NN VV NN BUSH PRESIDENT MADE SPEECH President Bush made a speech An Example VV MADE made
NP NP VP NR NN VV NN BUSH PRESIDENT MADE SPEECH President Bush made a speech An Example NN SPEECH speech
NP NP VP NR NN VV NN BUSH PRESIDENT MADE SPEECH President Bush made a speech An Example NP NP NR NN NR NN PRESIDENT President NP NP NR NN NR NN BUSH BUSH PRESIDENT Bush President Bush
NP NP VP NR NN VV NN BUSH PRESIDENT MADE SPEECH President Bush made a speech An Example
NP NP VP NR NN VV NN BUSH PRESIDENT MADE SPEECH President Bush made a speech An Example VP VP VV NN VV NN MADE a made a VP VP VV NN VV NN SPEECH MADE SPEECH a speech made a speech
NP NP VP NR NN VV NN BUSH PRESIDENT MADE SPEECH President Bush made a speech An Example NP VV NP NR NN VV 10 FRs NP NR NN VV BUSH PRESIDENT MADE President Bush made
NP NP VP NR NN VV NN BUSH PRESIDENT MADE SPEECH President Bush made a speech An Example
NP NP VP NR NN VV NN BUSH PRESIDENT MADE SPEECH President Bush made a speech An Example NP NP VP max_height = 2
Why We Don’t Extract Auxiliary Rules ? IP NP VP-B NP-B NP-B VV NR NR NN CC NN NN STEP SHANGHAI PUDONG DEVE WITH LEGAL ESTAB The development of Shanghai ‘s Pudong is in step with the establishment of its legal system
Outline • Introduction • Forest-to-String Translation Rules • Training • Decoding • Experiments • Conclusion
Decoding • Input: a source parse tree Output: a target sentence • Bottom-up strategy • Build auxiliary rules while decoding • Compute subcell divisions for building auxiliary rules
NP NP VP NR NN VV NN BUSH PRESIDENT MADE SPEECH An Example Rule NR BUSH Bush Derivation ( NR BUSH ) ||| Bush ||| 1:1 Translation Bush
NP NP VP NR NN VV NN BUSH PRESIDENT MADE SPEECH An Example Rule NN PRESIDENT President Derivation ( NN PRESIDENT ) ||| President ||| 1:1 Translation President
NP NP VP NR NN VV NN BUSH PRESIDENT MADE SPEECH An Example Rule VV MADE made Derivation ( VV MADE ) ||| made ||| 1:1 Translation made
NP NP VP NR NN VV NN BUSH PRESIDENT MADE SPEECH An Example Rule NN SPEECH speech Derivation ( NN SPEECH ) ||| speech ||| 1:1 Translation speech
NP NP VP NR NN VV NN BUSH PRESIDENT MADE SPEECH An Example Rule NP NR NN Derivation ( NP ( NR ) ( NN ) ) ||| X1 X2 ||| 1:2 2:1 ( NR BUSH ) ||| Bush ||| 1:1 ( NN PRESIDENT ) ||| President ||| 1:1 Translation President Bush
NP NP VP NR NN VV NN BUSH PRESIDENT MADE SPEECH An Example Rule Derivation Translation
NP NP VP NR NN VV NN BUSH PRESIDENT MADE SPEECH An Example VP Rule VV NN MADE made a Derivation ( VP ( VV MADE ) ( NN ) ) ||| made a X ||| 1:1 2:3 ( NN SPEECH ) ||| speech ||| 1:1 Translation made a speech
NP NP VP NR NN VV NN BUSH PRESIDENT MADE SPEECH An Example Rule NP NR NN VV PRESIDENT MADE President made a Derivation ( NP ( NN ) ( NN PRESIDENT ) ) ( VV MADE ) ||| President X made a ||| 1:2 2:1 3:3 ( NR BUSH ) ||| Bush ||| 1:1 Translation President Bush made a
NP NP VP NR NN VV NN BUSH PRESIDENT MADE SPEECH An Example Rule Derivation Translation
NP NP VP NR NN VV NN BUSH PRESIDENT MADE SPEECH An Example Rule NP NP VP VV NN Derivation ( NP ( NP ) ( VP ( VV ) ( NN ) ) ) ||| X1 X2 || 1:1 2:1 3:2 ( NP ( NN ) ( NN PRESIDENT ) ) ( VV MADE ) ||| President X made a ||| 1:2 2:1 3:3 ( NR BUSH ) ||| Bush ||| 1:1 ( NN SPEECH ) ||| speech ||| 1:1 Translation President Bush made a speech
NP NP VP NR NN VV NN BUSH PRESIDENT MADE SPEECH Subcell Division 1:1 2:2 3:3 4:4
NP NP VP NR NN VV NN BUSH PRESIDENT MADE SPEECH Subcell Division 1:3 4:4