150 likes | 167 Views
Explore how language technologies enhance digital libraries, including search systems, query formulation, document examination, and more. Learn about translation techniques and browse through relevant studies in the field. Contact Doug Oard for more information.
E N D
Language Technologies forScalable Digital Libraries Douglas W. Oard College of Information Studies and Institute for Advanced Computer Studies University of Maryland College Park, Maryland, USA http://www.glue.umd.edu/~oard ICDL 2004
Global Internet Users Native speakers, Global Reach projection for 2004 (as of Sept, 2003)
Global Internet Users Web Pages Native speakers, Global Reach projection for 2004 (as of Sept, 2003)
Search System Query Formulation Query Search Ranked List Selection Query Reformulation and Relevance Feedback Document Examination Document Source Reselection Delivery Supporting Information Access Source Selection
oil petroleum probe survey take samples No translation! Which translation? probe survey take samples cymbidium goeringii Wrong segmentation oil petroleum restrain
Learning to Translate • Lexicons • Phrase books, bilingual dictionaries, … • Large text collections • Translations (“parallel”) • Similar topics (“comparable”) • Similarity • Similar pronunciation, similar users • People
Hieroglyphic Demotic Greek
Statistical Machine Translation Señora Presidenta , había pedido a la administración del Parlamento que garantizase Madam President , I had asked the administration to ensure that
Translation for Assessment Indonesian City of Bali in October last year in the bomb blast in the case of imamaccused India of the sea on Monday began to be averted. The attack ongetting and its plan to make the charges and decide if itwere found guilty, he death sentence of May. Indonesia of the police said that the imam sea bomb blasts in his hand claim to be accepted. A night Club and time in the bomb blast in more than 200 people were killed and several injured were in which most foreign nationals. …
Monolingual Searcher Cross-Language Searcher Choose Document-Language Terms Choose Query-Language Terms Infer Concepts Select Document-Language Terms Query The Searcher’s View Author Choose Document-Language Terms Query-Document Matching Document
User-Assisted Query Translation iCLEF 2002, 20 minute sessions, each bar averages two subjects
Informing Practice • Parallel text enables new translation options • Already as good as the best hand-built systems • Automatic evaluation yields rapid improvement • Limiting factor is translation readability • Searchability is mostly a solved problem • Leveraging human translation has potential • Translation routing, volunteer translators Doug Oard oard@umd.edu