• 110 likes • 125 Views
Explore the use of thesauri at Elsevier for indexing, browsing, and improving search results. Learn about new requirements like integrating databases and supporting text mining for enhanced navigation and results. Experiment with RDF-based ontology, text mining technologies, and visualization techniques. Dive into the world of 'meta-research' insights and innovative applications for managing information overload.
E N D
Semantic (web) activity at Elsevier Marc Krellenstein VP, Search and Discovery Elsevier October 27, 2004 m.krellenstein@elsevier.com
Thesaurus use at Elsevier • Elsevier traditionally uses proprietary and standard thesauri for: • Indexing (tagging) articles, books and other materials • Browsing thesaurus-indexed content • Expanding searches against specialized content • Overall, a net benefit, but not huge • Limiting a search by category • Clustering documents by category • Better than limiting search up front…data-driven
Thesaurus use at Elsevier • Elsevier does not currently use thesauri for concept searching • Lack of demonstrated superiority to date over current best practice full text search
Thesaurus use at Elsevier • New thesaurus requirements and uses: • Integrated search of proprietary, public and/or local user content using multiple thesauri • Integrating chemical structure info with text documents • Integrating databases with diverse schemas • Supporting text mining • Other uses requested by our customers (e.g., extensibility for local content) • Improved thesaurus navigation • Improved search results
Approaches for new thesaurus uses Creating RDF-based intermediary ontology to map diverse thesauri Support multiple relationships Extensible by customers Improved performance, scalability Experimenting with search options Improving precision as well as recall Experimenting with visualization techniques (e.g., DOPE browser)
Text mining at Elsevier • Consider text mining a now capable technology that will be essential for managing information overload and providing new insights • Actively investigating uses and developing applications • Can provide both substantive and ‘meta-research’ insights • Trends over time, distribution by author or institution, etc. • View RDF as the eventual storage medium for extracted facts • Performance, maintainability, inferencing
Author teams In HIV research?
Indirect links from leukemia to Alzheimer’s via enzymes
Red – Product Pink – Reactant Green – Reagent Brown – Solvent …