1 / 18

Enriching Word Alignment with Linguistic Tags Linguistic Data Consortium, IBM

Enriching Word Alignment with Linguistic Tags Linguistic Data Consortium, IBM. Xuansong Li, Niyu Ge, Stephen Grimes, Stephanie M. Strassel, Kazuaki Maeda {xuansong, sgrimes, strassel, maeda} @ldc.upenn.edu niyuge@us.ibm.com. Outline. Motivations Approaches and methodologies

ifama
Download Presentation

Enriching Word Alignment with Linguistic Tags Linguistic Data Consortium, IBM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Enriching Word Alignment with Linguistic Tags Linguistic Data Consortium, IBM Xuansong Li, Niyu Ge, Stephen Grimes, Stephanie M. Strassel, Kazuaki Maeda {xuansong, sgrimes, strassel, maeda} @ldc.upenn.edu niyuge@us.ibm.com

  2. Outline • Motivations • Approaches and methodologies • Linguistic tags • Inter-annotator agreement • Conclusions

  3. Motivations • To improve automatic word alignment quality • To reduce data amount needed for statistic models • Supervised models outperform traditional models • A part of GALE by DARPA: manually aligned and tagged data – Chinese-English WA

  4. minimum translation units linguistic tags minimum match approach Unified Annotation Scheme tagging framework alignment framework attachment approach

  5. One 我 买 鲜 花 。 to One I buy fresh flowers . Many Many to to 春 节 快 乐 Many One Happy Chinese New Year Minimum Match Approach Minimum translation units: atomic

  6. Attach phrase-level unaligned words 他 带 了书 unaligned attached He brought the books Unattach sentence-level/discourse-level unaligned words 我们也 没有想去伤害他 unaligned unattached We didn’t want to hurt him Attachment Approach -- for unaligned words

  7. -Tag aligned links • Context-free links (2) • Context-dependent links (3) • Specific-feature links: Chinese-DE的 (3) -Tag unaligned words • Tags for attached words(12 types) • Tags for unattached words (2 types) Tagging Framework Goal: tackle insertion/deletion problems Methodologies: using linguistic tags

  8. 学 校 Semantic Links at school Function …屹 立 于 太 行 山 Links on Taihang Mountain …standing tall Context-free Links

  9. 把这项成果变成… turn this success into… 欢 迎 收 看CCTV Welcome to CCTV Context-dependent Links grammatically inferred link contextually inferred link

  10. DE-clause DE-modifier DE-possessive Specific Links: 的(DE) 经 历 过 战 争 的人 those who have experienced wars 新 技 术 的实 质 the essence of the new technology 将 军 的高 度 警 惕 great attention from the general

  11. Word Tags Aligned &Unaligned

  12. Examples: Word Tags

  13. Inter-Annotator Agreement(1) Chinese-English Alignment

  14. Inter-Annotator Agreement(2) Chinese-English Tagging

  15. Conclusion • Unified annotation scheme • Manually aligned and tagged corpora at LDC • Annotation guidelines available at: http://projects.ldc.upenn.edu/gale/task_specifications/ • Annotation toolkit available soon • On-going project: more data in pipeline • Acknowledgements to GALE of DARPA

  16. Thank You!

  17. Chinese-English Aligned and Tagged Corpora at LDC

  18. Annotation Rate • First pass alignment: 10,000w/10h • Second pass alignment: 10,000w/6h • First pass tagging: 10,000w/7h • Second pass tagging: 10,000w/5h Average skill, speed and difficulty level

More Related