230 likes | 396 Views
Evaluating Ontology-Mapping Tools: Requirements and Experience. Natalya F. Noy Mark A. Musen Stanford Medical Informatics Stanford University. Types Of Ontology Tools. Ontology Tools. Development Tools. Mapping Tools. Protégé-2000, OntoEdit OilEd, WebODE, Ontolingua.
E N D
Evaluating Ontology-Mapping Tools:Requirements and Experience Natalya F. Noy Mark A. Musen Stanford Medical Informatics Stanford University
Types Of Ontology Tools Ontology Tools Development Tools Mapping Tools Protégé-2000, OntoEdit OilEd, WebODE, Ontolingua PROMPT, ONION, OBSERVER, Chimaera, FCA-Merge, GLUE There is not just ONE class of ONTOLOGY TOOLS
Evaluation Parameters forOntology-Development Tools • Interoperability with other tools • Ability to import ontologies from other languages • Ability to export ontologies to other languages • Expressiveness of the knowledge model • Scalability • Extensibility • Availability and capabilities of inference services • Usability of tools
Evaluation Parameters ForOntology-Mapping Tools • Can try to reuse evaluation parameters for development tools, but: Ontology Tools Development Tools Mapping Tools Similar tasks, inputs, and outputs Different tasks, inputs, and outputs
Domain knowledge Ontologies to reuse Requirements Development Tools Input Task Output Create an ontology Domain ontology
Mapping Tools: Tasks A B A B A B C=Merge(A, B) Articulation ontology Map(A, B) iPROMPT, Chimaera Anchor-PROMPT, GLUE FCA-Merge ONION
Mapping Tools: Inputs Classes Classes Classes Classes Classes REQUIRES Instance data Shared instances DL definitions Slots and facets Slots and facets USES Chimaera iPROMPT GLUE FCA-Merge OBSERVER
GUI for interactive merging Lists of pairs of related terms List of articulation rules Anchor-PROMPT, GLUE FCA-Merge iPROMPT, Chimaera ONION Mapping Tools: Outputs and User Interaction
Can We Compare Mapping Tools? • Yes, we can! • We can compare tools in the same group • How do we define a group?
Architectural Comparison Criteria • Input requirements • Ontology elements • Used for analysis • Required for analysis • Modeling paradigm • Frame-based • Description Logic • Level of user interaction: • Batch mode • Interactive • User feedback • Required? • Used?
Architectural Criteria (cont’d) • Type of output • Set of rules • Ontology of mappings • List of suggestions • Set of pairs of related terms • Content of output • Matching classes • Matching instances • Matching slots
From Large Pool To Small Groups Space of mapping tools Architectural criteria Performancecriterion(within a single group)
Resources Required For Comparison Experiments • Source ontologies • Pairs of ontologies covering similar domains • Ontologies of different size, complexity, level of overlap • “Gold standard” results • Human-generated correspondences between terms • Pairs of terms, rules, explicit mappings
Suggestions that the user followed Suggestions that the tool produced Operations that the user performed Resources Required (cont’d) • Metrics for comparing performance • Precision (how many of the tool’s suggestions are correct) • Recall (how many of the correct matches the tool found) • Distance between ontologies • Use of inference techniques • Analysis of taxonomic relationships (a-la OntoClean) • Experiment controls • Design • Protocol
Where Will The Resources Come From? • Ideally, from researchers that do not belong to any of the evaluated projects • Realistically, as a side product of stand-alone evaluation experiments
Evaluation Experiment: iPROMPT • iPROMPT is • A plug-in to Protégé-2000 • An interactive ontology-merging tool • iPROMPT uses for analysis • Class hierarchy • Slots and facet values • iPROMPT matches • Classes • Slots • Instances
Evaluation Experiment • 4 users merged the same 2 source ontologies • We measured • Acceptability of iPrompt’s suggestions • Differences in the resulting ontologies
Sources • Input: two ontologies from the DAML ontology library • CMU ontology: • Employees of academic organization • Publications • Relationships among research groups • UMD ontology: • Individals • CS departments • Activities
Experimental Design • User’s expertise: • Familiar with Protégé-2000 • Not familiar with PROMPT • Experiment materials: • The iPROMPT software • A detailed tutorial • A tutorial example • Evaluation files • Users performed the experiment on their own. No questions or interaction with developers.
Experiment Results • Quality of iPROMPT suggestions: • Recall: 96.9% • Precision: 88.6% • Resulting ontologies • Difference measure: fraction of frames that have different name and type • Ontologies differ by ~30%
Limitations In The Experiment • Only 4 participants • Variability in Protégé expertise • Recall and precision figures without comparison to other tools are not very meaningful • Need better distance metrics
Research Questions • Which pragmatic criteria are most helpful in finding the best tool for a task • How do we develop a “gold standard” merged ontology? Does such an ontology exist? • How do we define a good distance metric to compare results to the gold standard? • Can we reuse tools and metrics developed for evaluating ontologies themselves?