1 / 43

Web-based Multilingual Active Reading System for Information Exchange

Developed at Kasetsart University, Thailand, this system facilitates active reading for information exchange, overcoming language barriers in accessing valuable scattered information. The system involves machine translation, multilingual dictionaries, and active reading techniques to enhance information dissemination and access. Future works include further refining the system for seamless language exchange.

lmabery
Download Presentation

Web-based Multilingual Active Reading System for Information Exchange

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web-based Multilingual Active Reading System for Information Exchange Asanee Kawtrakul and TEAM NAiST Laboratory Department of Computer EngineeringFaculty of EngineeringKasetsart University, Thailand

  2. Acknowledgement • JIRCAS • KURDI

  3. Outline • Introduction • First Step of Machine Translation • System Overview • Active Reading System • Future Works • Conclusions

  4. Introduction • Related to this Project • Valued information scattering throughout the organization • Beyond Machine Active Reading • About us

  5. Sources of information and Knowledge are Document are in anywhere in WWW Plus Information Annoucement Valued information scattering throughout the organization

  6. Information Exchange • Information Dissemination • Information Access

  7. Two of nontrivial problems • Identification and Accessibility • Language Barrier to access the information

  8. Document from USDA Prices Received by Farmers All Crops:The July index was 93, down 7.0 percent from June and 13 percent below July 1998. From June, sharp price decreases for feed grains and hay, food grains, vegetables, and oilseeds much more than offset price increases for potatoes and dry beans. Food Grains:The July index, at 75, was down 14 percent from the previous month and 16 percent below July 1998. The July all wheat price, at $2.15 per bushel, was down 35 cents from June and 41 cents below July 1998. This is the lowest price since August 1977. ……….. ……….. ………..

  9. SourceLanguage Analysis Understanding Generation TargetLanguage Machine Translation System Beyond Machine Active Reading:Machine Translation

  10. Requirements for Developing MT • Multilingual Dictionary • Multilingual Grammar • Parsing Techniques for Understanding the content • Language Generation Mechanisms • Other Domain Specific Knowledge

  11. Two Steps of Multilingual Information Exchange • Machine Active Reading • Full Text Machine Translation

  12. About us • SRU-NAiST Lab: Speciality Research Unit in NAtural Language Processing and intelligent Information System Technology: Members 30 • MT- 5 years project (2000-2005) • STREDEO-5 years project (2001-2006) • I-Know: 3 years project (2003-2005)

  13. Motivation • Valued data scattering throughout the organization in multi-language • Good Information collected by many individuals in unstructured format • Digested information gives quicker decision-making

  14. Motivations • To access information friendly and efficiently, STREDEO and I-Know Project • To reduce the barrier of Language , MT Project

  15. STREDEO PROJECT

  16. STREDEO PROJECT • Multimedia-Multilingual Document Storage, Retrieval and Delivery System for E-Organization • to provide an efficient and accurate method in storage, retrieval and delivery electronic document • focus on storing multimedia multilingual documents and multimedia query processing system

  17. STREDEO Overview

  18. Motivations • Quality Catalogue is labor intensive and expensive • IR Experiment • Swine via Google:we got 1276 issues • Use FAOBIB: Using synonym swine vs pig

  19. First step of Machine Translationfrom Thai to English • Thai agricultural information mostly presented in only Thai language • Ungraspable information for foreigners • Reading obstacles • Language understanding • Dictionary consultant • Unit conversion

  20. End-Users Full-TextMachine Translation Active ReadingSystem Web Storage Dictionary System Overview Internet

  21. Active Reading System • Reading assistant for information exchange • Focusing on OAE’s website

  22. Text withmeasurementunit Numeric Pure text Corpus Characteristics

  23. Corpus Char. (2) • Distinguishing among pure text and text with measurement unit • Parentheses succeeding the text • Numeric cells lacking of measurement unit • Non-numeric cells as short noun phrases

  24. Dictionary ConversionTable Pages Output TableAnalysis Chunk-level Translation UnitConversion OutputGeneration System Overview

  25. Table Analysis • Extracting table from a page • Resolving corresponding measurement unit of each numeric cell

  26. Table Analysis (cont) • Resolution heuristics • Attempt on left-top cell • Attempt on column head • Attempt on row head 2 1 This is to resolve. 3 Mt

  27. Table Analysis (cont)

  28. Chunk Translation • Translating non-numeric cell • Parsing and translating by cascaded chunk analysis (Abney, 1996)

  29. s pp np np vp Chunk Translation (cont) Finite-state cascades 1: np  d? adj* n+ vp  aux? v 2: pp  p np 3: s  pp* np pp* vp pp* 3: 2: 1: det adj n p d adj n aux v The big dog in a large house will sleep

  30. Chunk Translation (cont) ปลา ใหญ่ ฝูง นี้ n mod cl d ฝูง นี้ ใหญ่ ปลา cl d mod n นี้ ฝูง ofใหญ่ ปลา cl d mod n nns mods cld cld mods nns cld mods nns This school of big fish

  31. Chunk Translation (cont)

  32. Unit Conversion • Converting a numeric cell to a native-friendly measurement unit

  33. Output Generation • Generating an HTML output from the translation

  34. Conclusion • First step of Information Exchange is Machine Active Reading, • Collaborative work in Multilingual Dictionary Development in Some specific Domain is strongly needed and also utilize the AGROVOC from FAO, • Challenge Future work integrating with Grid project (Computational, VR&Acesss and Data Grid) is waiting for us. • I-Know Project could be contributed for our working group

  35. A Development of Information and Knowledge Extraction from Unstructured Thai Document: I-Know

  36. Information Extraction • All the information about Swine in Africa that has been published in 2000 and been presented at Conferences

  37. Requirements • Automated Querying • Information Extraction • Multi-viewpoints Knowledge Tracking • User Profiling • Document Taxonomies • Knowledge Summarization

  38. Ontological access • Ordered by categories • Described with Keywords • Link to the authors • Link to Journals • Links to conferences

  39. Root generic relations(e.g. hasPublication) organization event person publication IS-A relation . . . workshop conference journal paper concepts Aurawan ACM instances Introduction Domain specific ontology

  40. The End Thank you for your attention Any comments are warmly welcome

More Related