1 / 27

Indexing Mathematical Abstracts by Metadata and Ontology IMA Workshop, April 26-27, 2004

Indexing Mathematical Abstracts by Metadata and Ontology IMA Workshop, April 26-27, 2004. Su-Shing Chen, University of Florida suchen@cise.ufl.edu. Abstract.

lobo
Download Presentation

Indexing Mathematical Abstracts by Metadata and Ontology IMA Workshop, April 26-27, 2004

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Indexing Mathematical Abstracts by Metadata and OntologyIMA Workshop, April 26-27, 2004 Su-Shing Chen, University of Florida suchen@cise.ufl.edu

  2. Abstract • OAI extensions to federated search and other services for MathML-based metadata indexing and subject classification of mathematical abstracts. • Construction of ontology or conceptual maps of mathematics. Mathematical formulas are considered as elements of the ontology. • Ontology indexing by clustering mathematical abstracts or full papers into an information visualization interface so that users may select using ontology as well as metadata.

  3. Harvest API Harvester OAI_DC Data Mining Data Provider Service Provider DL Server Data Provider Service Provider OAI_XXX Federated Search A DL Server with OAI Extensions: Managing the Metadata Complexity

  4. A DL Server with OAI Extensions: Managing the Metadata Complexity Built in capabilities: • Harvester – harvest various OAI compliant data providers • Data provider – expose harvested and existing metadata sets • Service provider – federated search and data mining capabilities on metadata sets

  5. Harvest API Data Providers • Harvester Interface: • URL to harvest • Selective harvesting • parameters harvest Harvester parameters harvest Harvested metadata … DL Server Harvester

  6. Harvester Interface

  7. Harvester Interface

  8. Data Provider • Expose single or combined metadata sets harvested to other harvesters • Reformat metadata from different data providers to be harvested by other service providers (e.g., originally Dublin Core, reformat to MARC before exposing)

  9. Service Provider: Federated Search • Emulating a federated search service on existing and combined harvested metadata sets • Federated search across potentially other search protocols

  10. Federated Search

  11. Federated Search

  12. Federated Search

  13. Service Provider: Data Mining • Knowledge discovery on harvested metadata sets • Metadata classification using the Self-Organizing Map (SOM) algorithm • Improving retrieval effectiveness by providing concept browsing and search services

  14. Self-Organizing Map Algorithm • Competitive and unsupervised learning algorithm • Artificial neural network algorithm for visualizing and interpreting complex data sets • Providing a mapping from a high-dimensional input space to a two-dimensional output space

  15. Data Mining Service Provider System Architecture Browser Browser Concept browsing request Concept search request Response Response Request Response Concept Harvester SOM Categorizer Input Vector Generator Noun Phraser Fetch metadata Save SOM Metadata Database

  16. Concept Harvester • Screenshot of the SOM Categorizer

  17. Construction of Two-level Concept Hierarchy • Constructing the SOM for each harvested metadata set • SOMs of the lower layer are added to the upper-layer SOM. VTETD

  18. Top-level Concept Browsing

  19. Bottom-level Concept Browsing

  20. MEDLINE Database • Developed by the National Library of Medicine (NLM) • Bibliographic citations and abstracts from more than 4,600 biomedical journals published in the United States and 70 other countries. • Covering the fields of medicine, nursing, dentistry, veterinary medicine, the health care system, and the preclinical sciences. • Over 12 million citations • Searchable via PubMed or the NLM Gateway

  21. MeSH (Medical Subject Headings) • MEDLINE uses MeSH as its controlled vocabulary for indexing database articles • Indexers scan an entire article and assign MeSH headings (or MeSH descriptors) to each article • MeSH descriptors are arranged in both an alphabetic list and a hierarchical structure. • Updated annually to reflect the changes in medicine and medical terminology

  22. Our Experimentation • Problems • It is well known that searching by descriptors will greatly improve the search precision. • However, it is very difficult for naïve users to know and use exact MeSH descriptors to search. • In addition, as the database of MEDLINE grows, information overload would prevent users from finding relevant information of their interest. • Proposed Approach • Categorizations according to MeSH terms, MeSH major topics, and the co-occurrence of MeSH descriptors • Clustering using the results of MeSH term categorization through the Knowledge Grid • Visualization of categories and hierarchical clusters

  23. Data Access Services MeSH Major Topic Tree View SOM Tree View

  24. Knowledge Grid • Knowledge Grid Architecture Courtesy of Cannataro and Talia (Knowledge Grid: An Architecture for Distributed Knowledge Discovery)

  25. Future Directions • Develop a federated search service for OAI-compliant mathematical abstracts. • Develop an ontology or conceptual maps for mathematics. • Develop an ontology search service for mathematical abstracts and full papers. • Develop an interoperable architecture with other services, such as OCR of mathematical formulas.

  26. Acknowledgement • Many thanks to the NSF NSDL Program. • Collaborators – Joe Futrelle (NCSA), Ed Fox (Virginia Tech) • Student Team – Hyunki Kim, Chee Yoong Choo, Xiaoou Fu, Yu Chen

More Related