200 likes | 221 Views
III Jornadas de Bibliotecas Digitales El Escorial, 2002. Acceso a la información mediante exploración de sintagmas. Anselmo Peñas , Julio Gonzalo y Felisa Verdejo Dpto. Lenguajes y Sistemas Informáticos UNED. Overview. Motivation: problems in query formulation Hand-crafted approaches
E N D
III Jornadas de Bibliotecas Digitales El Escorial, 2002 Acceso a la información medianteexploración de sintagmas Anselmo Peñas, Julio Gonzalo y Felisa Verdejo Dpto. Lenguajes y Sistemas Informáticos UNED
Overview • Motivation: problems in query formulation • Hand-crafted approaches • Controlled vocabularies • Automatic approaches • Pure string processing • Automatic terminology extraction • Website Term Browser • Conclusions
Formulation Query Refinement Search engine Document ranking Docs. Precise information needs Information need Help users to express and precise their information needs • Vague need • User doesn’t know exactly what he is looking for • Broad need • Compile or summarize pieces of information around a topic Users develop strategies without system assistance
Information need Formulation Query Refinement Search engine Document ranking Docs. Language barriers Help users to overcome language barriers • Specific domain terminology • Find appropriate wording • Translinguality • Information available only in a foreign language • Natural Language characteristics • Lexical ambiguity • Terminology variation
Controlled vocabularies indexing & browsing Terminology General approaches Information Retrieval
Controlled vocabularies Problems • Construction & management (high cost) • Indexing • Manual keyword assessment • Errors in automatic keyword assessment • Domain specific • New domain needs a new thesaurus • Specialist oriented (know preferred descriptors) • Less specialized audience get poorer results
Controlled vocabularies indexing & browsing String Processing Terminology Free text indexing General approaches Information Retrieval
Search Free text searching • Help users to express and precise their information needs? • Help users to overcome language barriers?
Keyphrase navigation (Phrasier) Controlled vocabularies indexing & browsing String Processing Terminology Free text indexing Phrase indexing & browsing (Phind) General approaches Information Retrieval
“Keyphrase” navigation (Jones 1999) • Automatic extraction and assessment of 10 “keyphrases” to each document (KEA, Frank 1999) • Navigation between documents that share “keyphrases” Problems • No translinguality • No terminology variation
Problems • No translinguality • No terminology variation
Objectives • Develop a model • to help users to express and precise their information needs • to help users to overcome language barriers • Bringing to users the collection terminology • Morpho-syntactic, semantic & translingual variations • Without needs of thesauri construction • Establish an appropriate evaluation framework Website Term Browser
Keyphrase navigation (Phrasier) Controlled vocabularies indexing & browsing String Processing Terminology Free text indexing Terminology Retrieval & Term browsing (WTB) Phrase indexing & browsing (Phind) Disambiguation Conceptual indexing Automatic Terminology Extraction Proposed approach Information Retrieval Natural Language Processing
Terminology Retrieval From Automatic Terminology Extraction... Obtain lists of terms relevant for a specific domain • Term Extraction • Term Weighting • Term Selection ... to Terminology Retrieval Retrieve terms relevant for an information need • User query points the relevant terms • No terminology lists truncation • Favor recall relaxing term extraction patterns ... & Browsing • Navigate through relevant terminology • Access information from retrieved terms • Bridge the gap between query and collection vocabularies • Cross-Language
Query in Spanish Hierarchy of terms Ranking of documents English Spanish Catalan
Semantic variations Translingual variation Morpho-syntactic variations (permutation, insertion)
Usefulness of Term Browsing • 2000 session logs in UNED.es comparing: • - Use of term area from WTB • - Use of document area from Google
Conclusions Browsing of phrases and terminology • User oriented approach • Interaction over terminological information • Intermediate way between free-searching and thesaurus-guided searching • Without needs of thesaurus construction Website term Browser • Brings to users the collection terminology • Morpho-syntactic & semantic variations • Translinguality Evaluation • Users appreciate Term Browsing • WTB phrasal information can substantially complement the document ranking provided by the search engines