1 / 33

Support for Multilingual Information Access

Support for Multilingual Information Access. Douglas W. Oard College of Information Studies and Institute for Advanced Computer Studies University of Maryland, College Park, MD, USA. Multilingual Information Access. Help people find information that is expressed in any language.

waneta
Download Presentation

Support for Multilingual Information Access

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Support for Multilingual Information Access Douglas W. Oard College of Information Studies and Institute for Advanced Computer Studies University of Maryland, College Park, MD, USA Szechenyi National Library

  2. Multilingual Information Access Help people find information that is expressed in any language

  3. Outline • User needs • System design • User studies • Next steps

  4. Global Languages Source: http://www.g11n.com/faq.html

  5. Global Internet User Population 2000 2005 English English Chinese Source: Global Reach

  6. Global Internet Hosts Source: Network Wizards Jan 99 Internet Domain Survey

  7. European Web Size Projection Source: Extrapolated from Grefenstette and Nioche, RIAO 2000

  8. Global Internet Audio Over 2500 Internet-accessible Radio and Television Stations source: www.real.com, Mar 2001

  9. Who needs Cross-Language Search? • Searchers who can read several languages • Eliminate multiple queries • Query in most fluent language • Monolingual searchers • If translations can be provided • If it suffices to know that a document exists • If text captions are used to search for images

  10. Outline • User needs • System design • User studies • Next steps

  11. Cross-Language Browsing Translation Select Examine Document Delivery Multilingual Information Access Cross-Language Search Query

  12. Monolingual Searcher Cross-Language Searcher Choose Document-Language Terms Choose Query-Language Terms Infer Concepts Select Document-Language Terms Query The Search Process Author Choose Document-Language Terms Query-Document Matching Document

  13. Query Query Translation Translated Query Search Ranked List Selection Document Examination Document Query Reformulation Use Interactive Search Query Formulation

  14. Synonym Selection

  15. KeyWord In Context (KWIC)

  16. Outline • User needs • System design • User studies • Next steps

  17. Cross-Language Evaluation Forum • Annual European-language retrieval evaluation • Documents: 8 languages • Dutch, English, Finnish, French, German, Italian, Spanish, Swedish • Topics: 8 languages, plus Chinese and Japanese • Batch retrieval since 2000 • Interactive track (iCLEF) started in 2001 • 2001 focus: document selection • 2002 focus: query formulation

  18. iCLEF 2001 Experiment Design 144 trials, in blocks of 16, at 3 sites Participant Task Order Topic Key 1 Topic11, Topic17 Topic13, Topic29 Narrow: 11, 13 Broad: 17, 29 2 Topic11, Topic17 Topic13, Topic29 System Key 3 Topic17, Topic11 Topic29, Topic13 System A: System B: 4 Topic17, Topic11 Topic29, Topic13

  19. An Experiment Session • Task and system familiarization • 4 searches (20 minutes each) • Read topic description • Examine document translations • Judge as many documents as possible • Relevant, Somewhat relevant, Not relevant, Unsure, Not judged • Instructed to seek high precision • 8 questionnaires • Initial, each topic (4), each system (2), final

  20. Measure of Effectiveness • Unbalanced F-Measure: • P = precision • R = recall •  = 0.8 • Favors precision over recall • This models an application in which: • Fluent translation is expensive • Missing some relevant documents would be okay

  21.  CLEF AUTO French Results Overview

  22.  CLEF AUTO English Results Overview

  23. |-------- Broad topics ----------| |-------- Narrow topics ---------| Commercial vs. Gloss Translation • Commercial Machine Translation (MT) is almost always better • Significant with one-tail t-test (p<0.05) over 16 trials • Gloss translation usually beats random selection

  24. iCLEF 2002 Experiment Design Topic Description Standard Ranked List Query Formulation Automatic Retrieval Interactive Selection F Mean Average Precision 0.8

  25. Maryland Experiments • 48 trials (12 participants) • Half with automatic query translation • Half with semi-automatic query translation • 4 subjects searched Der Spiegel and SDA • 20-60 relevant documents for 4 topics • 8 subjects searched Der Spiegel • 8-20 relevant documents for 3 topics • 0 relevant documents for 1 topic!

  26. Some Preliminary Results • Average of 8 query iterations per search • Relatively insensitive to topic • Topic 4 (Hunger Strikes): 6 iterations • Topic 2 (Treasure Hunting): 16 iterations • Sometimes sensitive to system • Topics 1 and 2: system effect was small • Topics 3 and 4: fewer iterations with semi-automatic • Topic 3: European Campaigns against Racism

  27. Subjective Evaluation • Semi-automatic system: • Ability to select translations – good • Automatic system: • Simpler / less user-involvement needed - good • Few functions / easier to learn and use – good • No control over translations - bad • Both systems: • Highlighting keywords helps - good • Untranslated/poorly-translated words - bad • No Boolean or proximity operator – bad

  28. Outline • User needs • System design • User studies • Next steps

  29. Next Steps • Quantitative analysis from 2002 (MAP, F) • Iterative improvement of query quality • Utility of MAP as a measure of query quality? • Utility of semiautomatic translation • Accuracy of relevance judgments • Search strategies • Dependence on system • Dependence on topic • Dependence on density of relevant documents

  30. An Invitation • Join CLEF • A first step: Hungarian topics • http://clef.iei.pi.cnr.it • Join iCLEF • Help us focus on true user needs! • http://terral.lsi.uned.es/iCLEF

More Related