160 likes | 298 Views
Rule Refinement for Spoken Language Translation by Retrieving the Missing Translation of Content Words. Linfeng Song, Jun Xie , Xing Wang, Yajuan Lü and Qun Liu Institute of Computing Technology Chinese Academy of Sciences { songlinfeng,xiejun,wangxing,lvyajuan,liuqun }@ ict.ac.cn.
E N D
Rule Refinement for Spoken Language Translation by Retrieving the Missing Translation of Content Words Linfeng Song, Jun Xie, Xing Wang, YajuanLü and Qun Liu Institute of Computing Technology Chinese Academy of Sciences {songlinfeng,xiejun,wangxing,lvyajuan,liuqun}@ict.ac.cn
Motivation • Spoken language translation suffers serious problem of missing content words no, you need 10 minutes to go to the main street, (the bus) comes every 10 minutes
Motivation • further investigation shows that this happens due to the usage of incorrect MT rules 我 想 买 茶叶 送给 家人 做 礼物 。 rule:#X1# 茶叶#X2#-> #X1# #X2# 我 想 买 I would like to buy 送给 家人 做 礼物 。 souvenir for my family . result: I would like to buysouvenir for my family .
Motivation • There is no specific feature in classic SMT framework to distinguish bad rules from good ones. • An obvious way to tackle this problem is to find a way to distinguish those bad MT rules from the good ones.
two rules 推荐 的 茶 a good rule R1 tea recommended 推荐 的 茶 a bad rule that miss the translation of content word “推荐” R2 tea
two rules 推荐 的 茶 R1 tea recommended 推荐 的 茶 R2 may be favored by classic MT system Since it generate shorter translation result R2 tea
Our Model 推荐 的 茶 R1 tea recommended 推荐 的 茶 R2 tea
Training 这里 有 推荐 的 日本 茶 吗 do you have any japanese tea recommended …… bilingual corpus with word alignment info
Training 推荐 recommended 茶 tea 日本 japanese 日本 茶 japanese tea 这里 有 推荐 的 日本 茶 吗 do you have any japanese tea recommended …… bilingual corpus with word alignment info
Training isn’t content phrase content phrase stoplist 么 吗 的 … 这里 有 推荐 的 日本茶 吗 do you have any japanese tea recommended content words are label with bold face …… bilingual corpus with word alignment info
Training 推荐 recommended 茶 tea 日本 japanese 日本 茶 japanese tea … 这里 有 推荐 的 日本 茶 吗 do you have any japanese tea recommended …… bilingual corpus with word alignment information Co-relation table 茶 tea 13.76 茶 Japanese tea 4.89 …
Two penalties • Source Unaligned Penalty • the number of unaligned source content words in a rule • Target Unaligned Penalty • the number of unaligned target content words in a rule
Experiment • Data Sets • training : 280K CH-EN spoken language sentences • tuning : DEVSET2 of IWSLT 2010 • test : DEVSET3 ~ DEVSET6 of IWSLT 2010 • training set is used to our model
Thanks Q & A