160 likes | 293 Views
Evaluating Ontology Search Towards Benchmarking in Ontology Search Paul Buitelaar, Thomas Eigner Competence Center Semantic Web & Language Technology Lab DFKI GmbH Saarbrücken, Germany. Overview. Ontology Search Knowledge reuse (integration with Ontology Learning) OntoSelect
E N D
Evaluating Ontology Search Towards Benchmarking in Ontology Search Paul Buitelaar, Thomas Eigner Competence Center Semantic Web & Language Technology Lab DFKI GmbH Saarbrücken, Germany
Overview • Ontology Search • Knowledge reuse (integration with Ontology Learning) • OntoSelect • Browse (ontologies, labels, classes, properties) • Search by topic • Evaluating Ontology Search • Benchmark (evaluation) data set • Experiment (compare SWOOGLE, OntoSelect) • Conclusions
Ontology Search • There are more and more ontologies published on the (Semantic) Web • Available as RDFS or OWL files (also still DAML) • Opens up possibilities for reuse of knowledge • Access through ontology search engines and/or (manual/automatic) organization in ontology libraries • But: increasingly harder to find the right one for your application • Increasing research in ontology search/selection (Alani et al., Buitelaar et al., Ding et al., Sabou et al.) – SWOOGLE, OntoSelect, Watson
OntoSelect • Ontology Library and Search Engine http://olp.dfki.de/OntoSelect • Monitors the web for ontologies with automatic harvesting and indexing • Browse and search • On ontologies, classes, properties and (multilingual) labels • Ontology search integrates relevance feedback over Wikipedia for search term • Ontology publishing • Submit ontologies - will be automatically integrated • Statistics • On formats, languages, labels used, ontology publishing Paul Buitelaar, Thomas Eigner, Thierry Declerck OntoSelect: A Dynamic Ontology Library with Support for Ontology Selection In: Proc. of the Demo Session at the International Semantic Web Conference, Hiroshima, Japan, Nov. 2004.
Keyword Expansion (Extraction) Relevance Feedback from Wikipedia
Search Criteria • Relevance criteria address ontology content, structure, status: • Coverage - Term Matching • How many of the terms in a text collection are covered by labels for classes and properties? • Structure - Properties Relative to Classes • How detailed is the knowledge structure that the ontology represents? • Connectedness - Number of Included Ontologies • Is the ontology connected to other ontologies and how well established are these?
Evaluation – Benchmark • Benchmark: 15 Wikipedia topics and 57 manually assigned ontologies out of 1056 cached through OntoSelect • 15 Wikipedia topics were selected out of the set of all (37284) class/property labels in OntoSelect, by: • Filtering out labels that did not correspond to a Wikipedia page > 5658 labels / topics • 5658 labels were used as search terms in SWOOGLE to filter out labels that returned less than 10 ontologies (out of the 1056 in OntoSelect) > 3084 labels / topics • Out of 3084 labels we manually selected useful topics, e.g. we left out very short labels (‘v’) and very abstract ones (‘thing’) > 50 topics • We randomly selected 15 for which we manually checked the ontologies retrieved from OntoSelect and SWOOGLE > 15 topics with 57 assigned ontologies
Evaluation – Benchmark by Topic • 15 (Wikipedia) topics with number of assigned ontologies: • Atmosphere (2) • Biology (11) • City (3) • http://www.mindswap.org/2003/owl/geo/geoFeatures.owl • http://www.glue.umd.edu/ katyn/CMSC828y/location.daml • http://www.daml.org/2001/02/geofile/geofile-ont • Communication (10) • Economy (1) • Infrastructure (2) • Institution (1) • Math (3) • Military (5) • Newspaper (2) • Oil (0) • Production (1) • Publication (6) • Railroad (1) • Tourism (9)
Evaluation – Experiment • Comparison of (average) results between SWOOGLE and OntoSelect • Use OntoSelect benchmark • 15 topics (queries) • 57 assigned ontologies (relevance assessments) • 1056 ontologies (data set) • Use different configurations for OntoSelect • With/without keyword expansion/extraction • With/without class names (in addition to labels) • With/without property labels • Weighting of relevance criteria • …
Conclusions • Conclusions on evaluation are too early • Many more configurations (weights) to compare • Extend the benchmark • Comparison with other ontology search engines • Main contribution of the presented work • First comprehensive benchmark for topic-driven evaluation of ontology search • (Extended) Benchmark will be made publicly available http://olp.dfki.de/OntoSelect