1 / 9

Using Named Entity Recognition to improve Machine Translation

Using Named Entity Recognition to improve Machine Translation. Neeraj Agrawal Ankush Singla. Named Entities ( NEs ) Important Part of a Sentence No special treatment in current MT systems Results in dropping or mistranslations Chinese: 27 日 中午 , 他们 已 被 安全 转移 到 普吉岛

frisco
Download Presentation

Using Named Entity Recognition to improve Machine Translation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Named Entity Recognitionto improveMachine Translation Neeraj Agrawal AnkushSingla

  2. Named Entities (NEs) • Important Part of a Sentence • No special treatment in current MT systems • Results in dropping or mistranslations Chinese: 27日 中午 , 他们 已 被 安全 转移 到 普吉岛 Ref: they were safely moved to phuket island at noon on the 27 . Translation: 27 noon , they have been shifted to safe places to 3pm

  3. Adding NEs to training data • Helps the aligner • Increases the probability of the corresponding translations • Added about 0.7M bilingual NEs to training data after pre-processing

  4. Improvement Ref: they were safely moved to phuket island at noon on the 27 . Translation: at noon on the 27th , and they have been shifted to safe places to phuket . Limitation Ref: bush has also admitted that the training results of iraqisecurity. . . . Translation: interim national security seventy - buhi also admitted that . . .

  5. Adding extra feature • Source and target should have same number of NEs Feature Value = exp ( abs ( Diff. in # of proper nouns in source and target ) ) • Used number of proper nouns in absence of a good NER • The additional feature is given a negative weight by MERT (-0.03)

  6. Chinese: 强生 说 : " 她 已经 沈沦 到底 了 。 “ Reference: " she 's gone off the deep end , " johnson says . Base case : johnson and johnson said : " she has been to the end . " Cur model: johnson said : " she has been to the end . " Chinese: . . . 中国 总理 . . . Reference : chinese premier Base case: chinese prime minister Cur model: chinese premier

  7. Class Based Language Model • NEs given low prob. if they don’t appear in training data: “David is going for a walk” “Ankit is going for a walk” • Replace person and organization names by special tokens

  8. Improvement: Reference: however , powell still firmly holds . . . Base case: but the ball still insisted that the . . . Cur model: however , powell still insisted that the . . . Limitation: Base case: . . .with the support of about 30 people , shook hands , . . . Reference: " the president shook hands with around 30 supporters before . . Cur model: . . . about 30 people supportpresident jiang and deputy president jiangshook hands with . . . .

  9. Conclusion • NEs should be treated separately • Reduced dropping / mistranslation • Helped improve human readability

More Related