430 likes | 440 Views
Developed at Kasetsart University, Thailand, this system facilitates active reading for information exchange, overcoming language barriers in accessing valuable scattered information. The system involves machine translation, multilingual dictionaries, and active reading techniques to enhance information dissemination and access. Future works include further refining the system for seamless language exchange.
E N D
Web-based Multilingual Active Reading System for Information Exchange Asanee Kawtrakul and TEAM NAiST Laboratory Department of Computer EngineeringFaculty of EngineeringKasetsart University, Thailand
Acknowledgement • JIRCAS • KURDI
Outline • Introduction • First Step of Machine Translation • System Overview • Active Reading System • Future Works • Conclusions
Introduction • Related to this Project • Valued information scattering throughout the organization • Beyond Machine Active Reading • About us
Sources of information and Knowledge are Document are in anywhere in WWW Plus Information Annoucement Valued information scattering throughout the organization
Information Exchange • Information Dissemination • Information Access
Two of nontrivial problems • Identification and Accessibility • Language Barrier to access the information
Document from USDA Prices Received by Farmers All Crops:The July index was 93, down 7.0 percent from June and 13 percent below July 1998. From June, sharp price decreases for feed grains and hay, food grains, vegetables, and oilseeds much more than offset price increases for potatoes and dry beans. Food Grains:The July index, at 75, was down 14 percent from the previous month and 16 percent below July 1998. The July all wheat price, at $2.15 per bushel, was down 35 cents from June and 41 cents below July 1998. This is the lowest price since August 1977. ……….. ……….. ………..
SourceLanguage Analysis Understanding Generation TargetLanguage Machine Translation System Beyond Machine Active Reading:Machine Translation
Requirements for Developing MT • Multilingual Dictionary • Multilingual Grammar • Parsing Techniques for Understanding the content • Language Generation Mechanisms • Other Domain Specific Knowledge
Two Steps of Multilingual Information Exchange • Machine Active Reading • Full Text Machine Translation
About us • SRU-NAiST Lab: Speciality Research Unit in NAtural Language Processing and intelligent Information System Technology: Members 30 • MT- 5 years project (2000-2005) • STREDEO-5 years project (2001-2006) • I-Know: 3 years project (2003-2005)
Motivation • Valued data scattering throughout the organization in multi-language • Good Information collected by many individuals in unstructured format • Digested information gives quicker decision-making
Motivations • To access information friendly and efficiently, STREDEO and I-Know Project • To reduce the barrier of Language , MT Project
STREDEO PROJECT • Multimedia-Multilingual Document Storage, Retrieval and Delivery System for E-Organization • to provide an efficient and accurate method in storage, retrieval and delivery electronic document • focus on storing multimedia multilingual documents and multimedia query processing system
Motivations • Quality Catalogue is labor intensive and expensive • IR Experiment • Swine via Google:we got 1276 issues • Use FAOBIB: Using synonym swine vs pig
First step of Machine Translationfrom Thai to English • Thai agricultural information mostly presented in only Thai language • Ungraspable information for foreigners • Reading obstacles • Language understanding • Dictionary consultant • Unit conversion
End-Users Full-TextMachine Translation Active ReadingSystem Web Storage Dictionary System Overview Internet
Active Reading System • Reading assistant for information exchange • Focusing on OAE’s website
Text withmeasurementunit Numeric Pure text Corpus Characteristics
Corpus Char. (2) • Distinguishing among pure text and text with measurement unit • Parentheses succeeding the text • Numeric cells lacking of measurement unit • Non-numeric cells as short noun phrases
Dictionary ConversionTable Pages Output TableAnalysis Chunk-level Translation UnitConversion OutputGeneration System Overview
Table Analysis • Extracting table from a page • Resolving corresponding measurement unit of each numeric cell
Table Analysis (cont) • Resolution heuristics • Attempt on left-top cell • Attempt on column head • Attempt on row head 2 1 This is to resolve. 3 Mt
Chunk Translation • Translating non-numeric cell • Parsing and translating by cascaded chunk analysis (Abney, 1996)
s pp np np vp Chunk Translation (cont) Finite-state cascades 1: np d? adj* n+ vp aux? v 2: pp p np 3: s pp* np pp* vp pp* 3: 2: 1: det adj n p d adj n aux v The big dog in a large house will sleep
Chunk Translation (cont) ปลา ใหญ่ ฝูง นี้ n mod cl d ฝูง นี้ ใหญ่ ปลา cl d mod n นี้ ฝูง ofใหญ่ ปลา cl d mod n nns mods cld cld mods nns cld mods nns This school of big fish
Unit Conversion • Converting a numeric cell to a native-friendly measurement unit
Output Generation • Generating an HTML output from the translation
Conclusion • First step of Information Exchange is Machine Active Reading, • Collaborative work in Multilingual Dictionary Development in Some specific Domain is strongly needed and also utilize the AGROVOC from FAO, • Challenge Future work integrating with Grid project (Computational, VR&Acesss and Data Grid) is waiting for us. • I-Know Project could be contributed for our working group
A Development of Information and Knowledge Extraction from Unstructured Thai Document: I-Know
Information Extraction • All the information about Swine in Africa that has been published in 2000 and been presented at Conferences
Requirements • Automated Querying • Information Extraction • Multi-viewpoints Knowledge Tracking • User Profiling • Document Taxonomies • Knowledge Summarization
Ontological access • Ordered by categories • Described with Keywords • Link to the authors • Link to Journals • Links to conferences
Root generic relations(e.g. hasPublication) organization event person publication IS-A relation . . . workshop conference journal paper concepts Aurawan ACM instances Introduction Domain specific ontology
The End Thank you for your attention Any comments are warmly welcome