290 likes | 424 Views
An Environment for Merging and Testing Large Ontologies. Deborah McGuinness, Richard Fikes, James Rice*, Steve Wilder Associate Director and Senior Research Scientist Knowledge Systems Laboratory Stanford University Stanford, CA 94305 650-723-9770 dlm@ksl.stanford.edu.
E N D
An Environment for Merging and Testing Large Ontologies Deborah McGuinness, Richard Fikes, James Rice*, Steve Wilder Associate Director and Senior Research Scientist Knowledge Systems Laboratory Stanford University Stanford, CA 94305 650-723-9770 dlm@ksl.stanford.edu *CommerceOne, Mountain View, CA
Motivation: Ontology Integration Trends • Integrated in most search applications (Yahoo, Lycos, Xift, …) • Core component of E-Commerce applications (Amazon, eBay, Virtual Vineyards, REI, VerticalNet, CommerceOne, etc.) • Integrated in configuration applications (Dell, PROSE, etc.)
Motivation: Ontology Evolution • Controlled vocabularies abound (SIC-codes, UN/SPSC, RosettaNet, OpenDirectory,…) • Distributed ownership/maintenance • Larger scale (Open Directory >23.5K editors, ~250K categories, 1.65M sites) • Becoming more complicated - Moving to classes and slots (and value restrictions, enumerated sets, cardinality)
Chimaera – A Merging and Diagnostic Ontology Environment Web-based tool utilizing the KSL Ontolingua platform that supports: • merging multiple ontologies found in distributed environments • analysis of single or multiple ontologies • attention focus in problematic areas • simple browsing and mixed initiative editing
The Need For KB Merging • Large-scale knowledge repositories will contain KBs produced by multiple authors in multiple settings • KBs for applications will be built by assembling and extending multiple modular KBs from repositories • KBs developed by multiple authors will frequently • Express overlapping knowledge in a common domain • Use differing representations and vocabularies • For such KBs to be used together as building blocks - Their representational differences must be reconciled
The KB Merging Task • Combine KBs that: • Were developed independently (by multiple authors) • Express overlapping knowledge in a common domain • Use differing representations and vocabularies • Produce merged KB with • Non-redundant • Coherent • Unified vocabulary, content, and representation
How KB Merging Tools Can Help • Combine input KBs with name clashes • Treat each input KB as a separate name space • Support merging of classes and relations • Replace all occurrences by the merged class or relation • Test for logical consistency of merge (e.g. instances/subclasses of multiple disjoint classes) • Actively look for inconsistent extensions • Match vocabulary • Find name clashes, subsumed names, synonyms, ... • Focus attention • Portions of KB where new relationships are likely to be needed E.g., sibling subclasses from multiple input KBs • Derive relationships among classes and relations • Disjointness, equivalence, subsumption, inconsistency, ...
Merging Tools • Merging can be arbitrarily difficult • KBs can differ in basic representational design • May require extensive negotiation among authors • Tools can significantly accelerate major steps • KB merging using conventional editing tools is • Difficult Labor intensive Error prone • Hypothesis: tools specifically designed to support KB merging can significantly • Speed up the merging process • Make broader user set productive • Improve the quality of the resulting KB
Our KB Analysis Task • Review KBs that: • Were developed using differing standards • May be syntactically but not semantically validated • May use differing modeling representations • May have different purposes • Produce KB logs (in interactive environments) • Identify provable problems • Suggest possible problems in style and/or modeling • Are extensible by being user programmable
Chimaera Usage • HPKB program – analyze diverse KBs, support KR novices as well as experts • Cleaning semi-automatically generated KBs • Browsing and merging multiple controlled vocabularies (e.g., internal vocabularies and UN/SPSC (std products and services codes)) • Reviewing internal vocabularies
Discussion/Conclusion • Ontologies are becoming more central to applications, they are larger, more distributed, and longer-lived • Environmental support (in particular merging and diagnostic support) is more critical for the broader user base • Chimaera provides merging and diagnostic support for ontologies in many formats • It improves performance over existing tools • It has been used by people of various training backgrounds in government and commercial applications and is available for use. • http://www.ksl.Stanford.EDU/software/chimaera/ -movie, tutorial, papers, link to live system, etc.
The Need For KB Analysis • Large-scale knowledge repositories will contain KBs produced by multiple authors in multiple settings • KBs for applications will be built by assembling and extending multiple modular KBs from repositories that may not be consistent • KBs developed by multiple authors will frequently • Express overlapping knowledge in different, possibly contradictory ways • Use differing assumptions and styles • Have different purposes • KBs must be reviewed for appropriateness and “correctness”
What is an Ontology? Thesauri “narrower term” relation Frames (properties) Formal is-a General Logical constraints Catalog/ ID Informal is-a Formal instance Disjointness, Inverse, part-of… Terms/ glossary Value Restrs.
Ontologies and importance to E-Commerce Simple ontologies provide: • Controlled shared vocabulary (search engines, authors, users, databases, programs all speak same language) • Organization (and navigation support) • Expectation setting (left side of many web pages) • Browsing support (tagged structures such as Yahoo!) • Search support (query expansion approaches such as FindUR, e-Cyc) • Sense disambiguation
Ontologies and importance to E-Commerce II • Foundation for expansion and leverage • Conflict detection • Completion • Regression testing/validation/verification support foundation • Configuration support • Structured, comparative search • Generalization/ Specialization • …
E-Commerce Search (starting point Forrester modified by McGuinness) • Ask Queries - multiple search interfaces (surgical shoppers, advice seekers, window shoppers) - set user expectations (interactive query refinement) - anticipate anomalies • Get Answers - basic information (multiple sorts, filtering, structuring) - modify results (user defined parameters for refining, user profile info, narrow query, broaden query, disambiguate query) - suggest alternatives (suggest other comparable products even from competitor’s sites) • Make Decisions - manipulate results (enable side by side comparison) - dive deeper (provide additional info, multimedia, other views) - take action (buy)
A Few Observations about Ontologies • Simple ontologies can be built by non-experts • Consider Verity’s Topic Editor, Collaborative Topic Builder, GFP interface, Chimaera, etc. • Ontologies can be semi-automatically generated • from crawls of site such as yahoo!, amazon, excite, etc. • Semi-structured sites can provide starting points • Ontologies are exploding (business pull instead of technology push) • most e-commerce sites are using them - MySimon, Affinia, Amazon, Yahoo! Shopping,, etc. • Controlled vocabularies (for the web) abound - SIC codes, UMLS, UN/SPSC, Open Directory, Rosetta Net, … • Business ontologies are including roles • DTDs are making more ontology information available • Businesses have ontology directors • “Real” ontologies are becoming more central to applications
Implications and Needs • Ontology Language Syntax and Semantics • Environments for Creation and Maintenance of Ontologies • Training (Conceptual Modeling, reasoning implications, …) • Issues: • Collaboration among distributed teams • Diverse training levels • Interconnectivity with many systems/standards • Analysis and Diagnosis • Scale