1 / 15

Thesauri and Ontologies for Digital Libraries

Thesauri and Ontologies for Digital Libraries. Pavel Smrž, Anna Sinopalnikova, Martin Povolny { smrz, anna, xpovolny}@fi.muni.cz Faculty of Informatics, Masaryk University in Brno, Czech Republic. Outline. Motivation Role of Thesauri and Ontologies in Present DLs, Relations Covered

mattox
Download Presentation

Thesauri and Ontologies for Digital Libraries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Thesauri and Ontologiesfor Digital Libraries Pavel Smrž, Anna Sinopalnikova, Martin Povolny {smrz, anna, xpovolny}@fi.muni.cz Faculty of Informatics, Masaryk University in Brno, Czech Republic

  2. Outline • Motivation • Role of Thesauri and Ontologies in Present DLs, Relations Covered • Word-Association Thesaurus, CLIR • XML Document Management System • XML Family Standards, XSLT Processor Extension • Conclusions and Future Directions

  3. Motivation • size and complexity of DL grow rapidly • future DLs will need algorithms to process and understand contained data • intelligent procedures must be implemented to transform natural-language knowledge into a more appropriate representation • description of concepts and relations between them becomes crucial

  4. Motivation • common understanding of application domains is provided by ontologies • creation of broad-coverage ontologies from scratch is extremely labour-intensive • efforts to reuse (clean-up, refine, merge) existing resources = wordnet-like semantic networks, lexical databases, thesauri, ...

  5. Thesauri and Ontologies in Present Digital Libraries • structuring and classification of digital data(bibliographic classification supplemented/replaced by automatic conceptual document indexing) • contradictory results in the area of information retrieval (IR) Standard IR measures (precision/recall) vs. navigation through documents, userinterface aspects

  6. Relations Covered • Synonymy – query expansion (validated by the user) • true synonyms • style, register, regional variants • orthographic variants (proper names) • Hierarchical relations (hyponymy, meronymy) – query expansion, named entity recognition, ... • “see-also”, “related-to” relations – definition of topics

  7. Word-Association Thesauri • Large-scale psycholinguistic experiments (free association test) • Large numbers of stimulus-reaction pairs (170 000), many subjects (1 500) of different age, sex, profession, ... • Availability for English, German, Russian, Czech • Concept search rather than context search

  8. Cross-lingual information retrieval and extraction CLIR = finding documents in a language different from the one used in the query Multilingual resources (wordnets) for many languages (EuroWordNet, BalkaNet) linked by ILI CLIE = translation of answers back to the language of the user query Visualiasation of terms referring to hierarchically organized concepts

  9. XML Document Management System Integrating Ontologies • Several systems allow storing data and metadata together • BUT no support for efficient integration thesauri and ontologies • DEB – open-source client/server system for efficient storage and retrieval of arbitrary XML collections • XML-family standards employed in the data format, customization of UI, query language, visualisation, ...

  10. DEB Architecture

  11. XML-Family Standards in DEB • DEB clients use XSLT for transforming XML data into HTML (presented with the help of a HTML widget) • User-defined data views by means of XSLT • Client-side caching of parsed DOM objects • XPath for accessing information • OWL for storing ontologies transformed automatically from

  12. Extension of the Standard XSLT Processor • nested queries for efficient processing • XSLT sheets can request data from DEB server based on information processed • Special schema (deb://) creates a virtual space of XML documents that are results of the queries • Accessing the server data from XSLT processor the same way as any other external resources

  13. Extension of the Standard XSLT Processor

  14. Conclusions and Future Directions • Our research on the role of thesauri and ontologies in DL influenced the development of the Czech part of the multilingual lexical resource developed under the current BalkaNet project and the last extensions to the RussNet project. • DEB is currently used as the core DL engine at NLP Lab, FI MU, Brno, Czech Republic. It manipulates standard document collections as well as dictionaries, lexical semantic databases, e-learning materials, ...

  15. Future Directions • Open research problems related to the conceptual design of lexical resources (integration of generative concepts to the structure of knowledge bases) • DEB development – specialized modules for new W3C standards, three-level architecture (thin clients), simplification of UI customization by means of automatically generated XSLT, reimplementation in RUBY, ...

More Related