1 / 20

Constructing Bilingual Resources for Digital Libraries

Constructing Bilingual Resources for Digital Libraries. Rim, Hae-Chang Korea University 2000.8.10. Contents. Introduction Bilingual resources bilingual dictionary bilingual corpus bilingual thesaurus Our experience bilingual dictionary bilingual corpus bilingual thesaurus Summary.

storm
Download Presentation

Constructing Bilingual Resources for Digital Libraries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Constructing Bilingual Resources for Digital Libraries Rim, Hae-Chang Korea University 2000.8.10

  2. Contents • Introduction • Bilingual resources • bilingual dictionary • bilingual corpus • bilingual thesaurus • Our experience • bilingual dictionary • bilingual corpus • bilingual thesaurus • Summary

  3. Introduction • What is the problem? • language barrier at multilingual digital library. • How to solve the problem? • machine translation(MT) • cross-language information retrieval(CLIR) • Why bilingual resources? • MT and CLIR are based on bilingual resources. • What shall we do? • constructing • Korean-English bilingual dictionary • Korean-English bilingual corpus • Korean-English bilingual thesaurus

  4. MT CLIR Overview bilingual resources DL DL language barrier

  5. Bilingual Resources • Bilingual dictionary • Bilingual corpus • Bilingual thesaurus

  6. Bilingual Dictionary • Definition • dictionary containing words and their translated words. • Application field • CLIR • [Oard 98], [Fujii et al. 99], [Myaeng et al. 99] • MT • Utilization translated words “atmosphere” “waiting” CLIR word “대기” bilingual dictionary “대기1” – “atmosphere” “대기2” – “waiting” MT

  7. Bilingual Corpus (1) • Definition • comparable corpus • a collection of similar texts in different languages • parallel corpus • a collection of texts which have been translated into one or more other language(s). • Ex) Canadian Hansard corpus • Application field • CLIR • [Yang et al. 98] • MT • Example-Based Machine Translation • [Brown 96], [Murata et al. 99], [Shirai et al.97] • [Turcato et al 99]

  8. Bilingual Corpus (2) • Utilization translated words “대기” - “atmosphere” - “waiting” “오염” - “pollution” “대기 오염” “atmosphere pollution” ? “waiting pollution” ? bilingual corpus “the sources of atmosphere pollution may have a global, regional and local character.” “대기 오염의 원인은 전세계적, 국부적, 그리고 지역적인 특징을 가진다.” MT CLIR translated phrase “대기 오염” “atmosphere pollution”

  9. Bilingual Thesaurus (1) • Definition • a collection of words in two languages that are put into groups together according to connections between their meanings • Ex) EuroWordNet • Application field • CLIR • concept-based CLIR • [Gonzalo et al. 98], [Gilarranz et al. 97]

  10. Bilingual Thesaurus (2) • Utilization bilingual thesaurus word “대기” {region, part} {air} {atmosphere, 대기} CLIR word concept “region” “inactivity” {inactivity} {wait,waiting, 대기} {pause}

  11. Our Experience • Bilingual dictionary • Bilingual corpus • Bilingual thesaurus

  12. Bilingual Dictionary • Korean-English bilingual dictionary • size • 2 million entries • application bilingual biographical dictionary “링컨” - “Lincoln” person’s name “링컨” translated person’s name “Lincoln” CLIR MT

  13. Bilingual Corpus • Korean-English bilingual corpus • parallel corpus containing 250,000 words • based on CES(Corpus Encoding Standard) • Corpus construction tools • corpus refining tools • corpus annotating tools • bilingual concordancer

  14. {region, part} {atmosphere, 대기} {region, part} {atmosphere} {air} Bilingual Thesaurus (1) • Goal • Constructing a Korean-English bilingual thesaurus • Approach • assigning Korean words to corresponding English words in WordNet Korean word “대기” WordNet {air} [ Korean-English bilingual thesaurus ]

  15. Bilingual Thesaurus (2) • Current status of the task • under construction

  16. Summary • Surmounting the language barrier • using bilingual resources • Korean-English bilingual resources • Korean-English bilingual dictionary • Korean-English bilingual corpus • Korean-English bilingual thesaurus • Our experience • Korean-English bilingual dictionary • Korean-English bilingual corpus • Korean-English bilingual thesaurus

  17. reference(1) • [Oard 98] Douglas W. Oard, “A Comparative Study of Query and Document Translation for Cross-Language Information Retrieval”, the Third Conference of the Association for Machine Translation in the Americas (AMTA), Philadelphia, PA, October, 1998. • [Fujii et al. 99] Atsushi Fujii, Tetsuya Ishikawa, "Cross-Language Information Retrieval for Technical Documents", Proceedings of the joint ACL SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp.29-37, 1999. • [Myaeng et al. 99] Sung Hyon Myaeng and Myung-gil Jang, "Complementing Dictionary-Based Query Translations with Corpus Statistics for Cross-Language IR", Machine Translation Summit VII, 1999.

  18. reference(2) • [Yang et al. 98] Yiming Yang, Jaime G. Carbonell, Ralf D. Brown, and Robert E.F rederking. "Translingual Information Retrieval: Learning from Bilingual Corpora", In Artificial Intelligence, Special issue: Best of IJCAI-97). Vol. 103 (1998), pp. 323-345 • [Brown 96] Ralf D. Brown, “Example-Based Machine Translation in the Pangloss System”, In Proceedings of the 16th International Conference on Computational Linguistics (COLING-96), pp.169-174, Copenhagen, Denmark, August 5-9, 1996. • [Murata et al. 99] Murata, M, Q. Ma, K.Uchimoto, H. Isahara, "An Example-Based Approach to Japanese-to-English Translation of Tense, Aspect, and Modality", in TMI'99, Chester, UK, August 23, 1999.

  19. reference(3) • [Shirai et al. 97] Shirai, S., F. Bond, and Y. Takahashi. 1997. “A Hybrid Rule and Example based Method for Machine Translation.”In Natural Language Processing Pacific Rim Symposium '97: NLPRS-97. • [Turcato et al. 99] Davide Turcato, Paul McFetridge, Fred Popowich, Janine Toole, "A Unified Example-Based and Lexicalist Approach to Machine Translation", at the 8th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI-99) • [Gonzalo et al. 98] Julio Gonzalo, Felisa Verdejo, Carol Peters and Nicoletta Calzolari, “Applying EuroWordNet to Cross-Language Text Retrieval”, Computers and the Humanities, Vol 32, Nos. 2-3, pp. 73-89, 1998.

  20. reference(4) • [Gilarranz et al. 97] Julio Gilarranz, Julio Gonzalo and Felisa Verdejo, "An Approach to Conceptual Text Retrieval Using the EuroWordNet Multilingual Semantic Database", AAAI 97.

More Related