90 likes | 123 Views
Named Entities (NEs) are crucial but often overlooked in machine translation systems, leading to errors or omissions. This study explores the impact of incorporating NEs in training data and introducing additional features to improve translation accuracy. Results show significant enhancements in translation quality and human readability.
E N D
Using Named Entity Recognitionto improveMachine Translation Neeraj Agrawal AnkushSingla
Named Entities (NEs) • Important Part of a Sentence • No special treatment in current MT systems • Results in dropping or mistranslations Chinese: 27日 中午 , 他们 已 被 安全 转移 到 普吉岛 Ref: they were safely moved to phuket island at noon on the 27 . Translation: 27 noon , they have been shifted to safe places to 3pm
Adding NEs to training data • Helps the aligner • Increases the probability of the corresponding translations • Added about 0.7M bilingual NEs to training data after pre-processing
Improvement Ref: they were safely moved to phuket island at noon on the 27 . Translation: at noon on the 27th , and they have been shifted to safe places to phuket . Limitation Ref: bush has also admitted that the training results of iraqisecurity. . . . Translation: interim national security seventy - buhi also admitted that . . .
Adding extra feature • Source and target should have same number of NEs Feature Value = exp ( abs ( Diff. in # of proper nouns in source and target ) ) • Used number of proper nouns in absence of a good NER • The additional feature is given a negative weight by MERT (-0.03)
Chinese: 强生 说 : " 她 已经 沈沦 到底 了 。 “ Reference: " she 's gone off the deep end , " johnson says . Base case : johnson and johnson said : " she has been to the end . " Cur model: johnson said : " she has been to the end . " Chinese: . . . 中国 总理 . . . Reference : chinese premier Base case: chinese prime minister Cur model: chinese premier
Class Based Language Model • NEs given low prob. if they don’t appear in training data: “David is going for a walk” “Ankit is going for a walk” • Replace person and organization names by special tokens
Improvement: Reference: however , powell still firmly holds . . . Base case: but the ball still insisted that the . . . Cur model: however , powell still insisted that the . . . Limitation: Base case: . . .with the support of about 30 people , shook hands , . . . Reference: " the president shook hands with around 30 supporters before . . Cur model: . . . about 30 people supportpresident jiang and deputy president jiangshook hands with . . . .
Conclusion • NEs should be treated separately • Reduced dropping / mistranslation • Helped improve human readability