290 likes | 302 Views
Explore the concept of ontologies in AI systems, ontology sharing, and establishing ontologies using real data sources and manual processing. Create maintainable ontologies for accurate system understanding and communication.
E N D
Scalable Knowledge Composition November 1997 Jan Jannink, Danladi Verheijen, Gio Wiederhold Stanford University An abstract concept is like a valise with a false bottom. you may put in what you please, and take them out again, without being observed. Alexis de Toqueville, Democracy in America, 1838. Gio Wiederhold SKC 1
SKC Progress Report • Goal: Reliable answers using heterogeneous data sources • General sources: factbook ‘96, UN • Topical sources: EIA, OECD, OPEC • Approach: Bottom up from data • Python scripts implement rule-based operations on source data to answer challenge problems • Theory: Rule-based algebra • Mapping primitives & Intersection operation Gio Wiederhold SKC 2
Web as Source of Ontology • We extract portions of ontology implicit in sites • factbook ‘96: www.odci.gov/cia • UN: www.un.org & www.globalpolicy.org • EIA: www.eia.doe.gov • OECD: www.oecd.org • OPEC: www.opec.org Gio Wiederhold SKC 3
What is the most recent year an OPEC member nation was on the UN security council? Related to CP # 72 Sources factbook ‘96 (nation) OPEC (members, dates) UN (SC members, years) Correct Answer 1996 (Indonesia) Problems different country names Gambia => The Gambia historical country names Yugoslavia factbook has out of date OPEC & UN SC lists Gabon (left OPEC 1994) UN lists future security council members Gabon 1998 intent of original question Temporal variants Example Query Gio Wiederhold SKC 4
Partial Query Data Source: OPEC Pages UN Pages * Problems handled using SKC articulation rules Gio Wiederhold SKC 5
Current Directions • Experience w/ real data confirming validity of our approach • Expert sources are better maintained than general sources • We generate successive approximations with increasing levels of confidence • Manual processing of sources is our first step in providing an algebra that truly accounts for the complexity of real data sources Gio Wiederhold SKC 6
Ontology? Ontologies list the terms and their relationships that allow communication among partners in enterprises (in machine-readable form) Relationships determine meaning - parent, school, company Databases use ontologies during design in their E-R diagrams (Implicitly) and represent the leaf nodes in their schemas Knowledge-bases use ontologies (often implicitely) add class definition (to hold instances), constraints, and operations among the terms Gio Wiederhold SKC 7
Functions of Ontologies • Define Terms used in System Construction to enable Correctness in Understanding system = designers, implementors, users, maintainers designers = implementors = users = maintainers • Define Higher-level Abstractions needed to communicate in larger contexts managers, decision-makers, systems in own, other domains • Share the Cost of Knowledge Acquistion & Maintenance reuse encoded knowledge, remain up-to-date as domains change Gio Wiederhold SKC 8
Ancestors of Ontologies • Lexicons: collect terms used in inform. systems • Taxonomies: categorize, abstract, classify terms • Schemas of databases: attributes, ranges filed • Data dictionaries: integration of files, attributes • Object libraries: grouped attributes, methods • Symbol tables: collect terms used in a program • Domain object models: re-engineering terms • . . . More Knowledge Gio Wiederhold SKC 9
Establishing Ontologies Top-down: • Commonly acceptable UPPER layers Domain-specific • Sharing tools • Object based Bottom-up • Pragmatic, TASK-specific collections • Database schemas and models Gio Wiederhold SKC 10
Ontologies in Use • Implicit ontologies are a prerequisite for • communication among humans and organizations. • Knowledge is explicitely represented in AI-systems; • sometimes the ontologyis explicit as well. • Database schemas are partial explicit ontologies • Relational schemas only terms & 1:1 dependencies. • E-R designs contain 1:n, m:n cardinalities • Structural schemas contain semantic dep. types • Conceptual graphs define terms of discourse and • a modest number of relationship types • Variables in software represent ontologies poorly. Gio Wiederhold SKC 11
Ontology Sharing Three Alternatives • Create a committee to define everybody’s terms • Takes many years, until people are worn out • Ignored when changes make deviation necessary • Get all terms and put them into large model [ Cyc, UMLS, Federated Schemas, . . . ] • Can be rapid • Provides broad integration • Ignores conflicts • Hard to maintain (requires committee) • Keep all Terms distinct, except where sharing • Requires initial effort • Complex system view • Empowers participants • Scalable with many participants Gio Wiederhold SKC 12
SKC SKC Objective Provide for Maintainable Ontologies • devolve maintenance onto many domain-specific experts / authorities • provide an algebra to compute composed ontologies that are limited to their articulation terms • enable interpretation within the source contexts Gio Wiederhold SKC 13
SKC Working Definitions • Ontology: a set of terms and their relationships • Term: a reference to real-world and abstract objects • Relationship: a named and typed set of links between objects • Reference: a label that names objects • Real-world object: an entity instance with a physical manifestation • Abstract object: a concept which refers to other objects Gio Wiederhold SKC 14
We Consider as Ontologies: • Object oriented class hierarchies, (snapshots of executing programs capture object instances) • Database schemas, (via their E-R or structural models) • Semi-structured databases, (OEM <OID, label, type, value>) • Definitional thesauri, (UMLS: see http://www.lexical.com) • Knowledge bases.(CYC, Ontolingua) SKC specifically does not restrict its applicability to a purely extensional (object) or intensional (schema) definition of ontology, since its purpose is to support useful processing of extensions using intensional knowledge for all parties. To that end it is important that the intensional specifications include predicates or methods that permit the collection of extensional access to real-world objects. We do not require ontologies to be complete specifications of a domain, but rather that usage of an ontology provide results complete with respect to the ontology. Gio Wiederhold SKC 15
Aspects that Focus SKC • The mapping of terms to objects differs between autonomous domains. • The collections of real-world objects provides a grounding for the definitions, and an opportunity for validation of the meaning of the terms being employed.: • Relationships have semantic, and derived from that, structural significance. Multiple relationship types may share structural characteristics, as IS-A, Ownership, Part-of, Reference, • We will keep the number of primitive relationships limited, • The mapping of relationship types differs between autonomous domains. Gio Wiederhold SKC 16
No committee is needed to forge compromises * within a domain Domains and Consistency • a domain will contain many objects • the object configuration is consistent • within a domain all terms are consistent & • relationships among objects are consistent • context is implicit Domain Ontology • Compromises hide valuable details Gio Wiederhold SKC 17
Domain Heterogeneity If interoperation involves distinct domains, mismatch ensues • Autonomy conflicts with consistency, • Local Needs have Priority, • Outside uses are a Byproduct Heterogeneity must be addressed • Platform and Operating Systems 4 4 • Representation and Access Conventions 4 • Naming and Ontology : Gio Wiederhold SKC 18
Intersection create a subset ontology • keep sharable entries • Union create a joint ontology • merge entries • Difference create a distinct ontology • remove shared entries An Ontology Algebra A knowledge-based algebra for ontologies The Articulation Ontology (AO) consists of matching rules that link domain ontologies Gio Wiederhold SKC 19
Features of an Algebra • Operations can be composed • Operations can be rearranged • Alternate arrangements can be evaluated • Optimization is enabled • The record of past operations can be kept and reused Gio Wiederhold SKC 20
INTERSECTION Operation Terms useful for purchasing Result contains shared terms Source Domain 1: Owned and maintained by Store Source Domain 2: Owned and maintained by Factory Gio Wiederhold SKC 21
INTERSECTION Support Articulation ontology Matching rules that use terms from the 2 source domains Terms useful for purchasing Store Ontology Factory Ontology Gio Wiederhold SKC 22
Shoe Factory • Material inventory {...} • Employees { . . . } • Machinery { . . . } • Processes { . . . } • Shoes { . . . } Shoe Store • Shoes { . . . } • Customers { . . . } • Employees { . . . } Sample Intersections Articulation ontology matching rules : size = size color =table(colcode) style = style Ana- tomy {. . . } Hard- ware foot = foot Employees Employees Nail (toe, foot) Nail (fastener) . . . . . . Department Store Gio Wiederhold SKC 23
Arti- culation ontology Other Basic Operations DIFFERENCE: material fully under local control UNION: merging entire ontologies typically prior intersections Gio Wiederhold SKC 24
Articulation knowledge for U (A B) U U U (B C) Legend: U (C E) U : union U (C E) U : intersection B) (A U U (B C) (C D) Knowledge Composition Composed knowledge for applications using A,B,C,E Articulation knowledge Knowledge resource E Articulation knowledge for Knowledge resource C U Knowledge resource A Knowledge resource B Knowledge resource D Gio Wiederhold SKC 25
Exploiting the Result Result has links to source Processing and evaluation is best performed within Source Domains Gio Wiederhold SKC 26
Innovation in SKC • No need to harmonize full ontologies • Focus on what is critical for interoperation • Rules specific for articulation • Potentially many sets of articulation rules • Maintenance is distributed • to n sources • to m articulation agents is m < n2 , depending on architecture density a research question Gio Wiederhold SKC 27
Empowerment Domain Specialization • Knowledge Acquisition (20% effort) & • Knowledge Maintenance (80% effort *) • Performed by: • Domain specialists • Professional organizations • Modest sized field teams automously maintainable * based on software maintenance experience Gio Wiederhold SKC 28
Summary • Algebra enables Interoperation by • dealing explicitly with differences by knowledge • identifying maintenance domains • keeping sources autonomous • Assumes domain has a common ontology • composing domain ontologies requires the algebra to manage the linkages where articulation occurs • processes are best executed within the domains • Articulation knowledge is distributed • allows specialists to work independently • supports multiple intersections and views • Maintenance is structured and partitioned Gio Wiederhold SKC 29