1 / 37

Ontology Building across Heterogeneous Databases

Ontology Building across Heterogeneous Databases. Michael N. Huhns Center for Information Technology University of South Carolina. The Fundamental Problem.

bbermudes
Download Presentation

Ontology Building across Heterogeneous Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ontology Building across Heterogeneous Databases Michael N. Huhns Center for Information Technology University of South Carolina

  2. The Fundamental Problem • We would like to arrange for effective and efficient interactions among large numbers of heterogeneous information components: databases, applications, and interfaces • Difficulties are • Components are incomprehensible, inconsistent, and often unknown in advance • We need to enable updates as well as retrievals • The information environment is open • We need to consider process and policy, as well as structure University of South Carolina

  3. Needs and Applications • Heterogeneous database access and management • Information search, retrieval, and fusion • Workflow automation • Agent communication • Information management: consistency • Distributed collaboration • Distance education University of South Carolina

  4. Emerging Solution: A Cooperative Information System Agent Application Application Application Agent Agent Agent Application Agent Agent Agent E-Mail System Agent Agent Workflow System Database System Web System University of South Carolina

  5. User Agent Resource Agent User Agent Resource Agent User Agent Resource Agent Another View of CIS Middleware: Mediators, Brokers, Facilitators, Ontologies, and Registries

  6. (de facto) Standard Agent Types and Architectures Application Program User Interface Agent MCC InfoSleuth CMU RETSINA SRI OAA USC-ISI SIMS & TeamCore Global InfoTek Grid Reply Reg/Unreg (KQML) Reply Query or Update (SQL) Ontology Agent Broker Agent Reg/Unreg (KQML) Mediator Agent Ontology (OKBC) Reg/Unreg (KQML) Registry Agent Mediated Query (SQL) Reg/Unreg (KQML) Schemas (CLIPS) 11179 Registry Mediated Query (SQL) Reply Reply Database Resource Agent Database Resource Agent SQL (JDBC) University of South Carolina

  7. Implementing the Agent Architecture • How to build an agent • How to construct an ontology University of South Carolina

  8. Models for Database #1 Title Phone Name Person coAuthors Document (1,N) (1,N) SSN Per_cent Relational Model Person (SSN , Name, Phone) CoAuthors (SSN, Title, Per_cent) Document (Title) University of South Carolina

  9. Models for Database #2 Title EID Name Employee fillsOut ComplianceForm (1,1) (1,N) SSN Phone Relational Model Employee (EID , Name, SSN) ComplianceForm (Title, EID) University of South Carolina

  10. Thing Class of All Class of All Entity Relations Attributes Person Person Name Attributes Person SSN Employee ID Full-Time Part-Time Full-Time Employee Employee Employee Attributes Attributes Domain Ontology Document Relations Person Document Document Attributes Coauthors ComplianceForm Employee Employee Document Title FillsOut Attributes Part-Time Employee University of South Carolina

  11. Semantic Mappings Common Ontology Application 1 Interface 1 Entity Articulation Axiom 3 Mappings are sentences in some logical language, e.g., KIF, Loom, CLIPS Articulation Axiom 1 Document Person Boat Homemaker Employee Minor Articulation Axiom 4 Articulation Axiom 2 DB1 DB2 Person Employee SSN Name EID Name University of South Carolina

  12. Ontologies and DBs • An ontology specifies the intended meaning of concepts in a database: DB Schema: Table: PartsPrice *stockNo: integer cost: float Ontology: price(x,y) => $ (x’,y’)[automobile_part(x’) & stock_no(x’) = x & retail_price(x’,y’) & magnitude(y’,US_dollars)=y] University of South Carolina

  13. Semantic Translation Semantic Translation by Mappings by Mappings Semantic Translation Semantic Translation by Mappings by Mappings Semantic Translation by Mappings DB1 DB1 DB1 Semantic Translation User Application 1 Application n Agent for Application Agent for Application Common Enterprise-Wide View Agent for Resource Agent for Resource Agent for Resource University of South Carolina

  14. Workflow Automation of Telecommunication Service Provisioning User Interface Agent Transaction Scheduling Agent User + Application Schedule Repairing Agent Schedule Processing Agent ESS ESS . . . Switch DB LFACS DB TIRKS DB University of South Carolina

  15. Example Workflow in Telecommunications Service Request Span in Place? Service Order Create Bill LFACS TIRKS FEPS Switch TIRKS TIRKS NSDB WFA University of South Carolina

  16. Semantic Model for Interface Agent id* date name* phone Service Order Ordered by Customer Orders quantity Circuit type aLocation zLocation University of South Carolina

  17. Dimensions of Heterogeneity: Structure • Schemas and views, e.g., securities are stocks • Specializations and generalizations of domain concepts, e.g., stocks are a kind of liquid asset • Value maps, e.g., S&P A+ rating corresponds to Moody’s A rating • Semantic data properties, sufficient to characterize the value maps, e.g., prices on the Madrid Exchange are daily averages rather than closing prices • Cardinality constraints • Integrity constraints, e.g., each stock must have a unique SEC identifier • Data value ranges, e.g., Price > 0 • Allow or disallow “maybe values” for data University of South Carolina

  18. Dimensions of Heterogeneity: Process • Procedures, i.e., how to process information (e.g., how to decide what stock to recommend) • Preferences for accesses and updates in case of data replication (based on recency or accuracy of data) • Preferences to capture view update semantics • Contingency strategies, e.g., whether to ignore, redo, or compensate • Contingency procedures, i.e., how to compensate transactions • Flow, e.g., where to forward requests or results • Temporal constraints, e.g., report tax each quarter University of South Carolina

  19. Dimensions of Heterogeneity: Policy • Security, i.e., who has rights to access or update what information? (e.g., customers can access all of their accounts, except blind trusts) • Authentication, i.e., a sufficient test to establish identity (e.g., passwords, retinal scans, or smart cards) • Bookkeeping (e.g., logging all accesses) University of South Carolina

  20. Definition • Ontology: a representation of knowledge specific to some universe(s) of discourse • Ontology: an agreement about a shared conceptualization, which includes conceptual frameworks for modeling domain knowledge and agreements about the representation of particular domain theories University of South Carolina

  21. Key Words • Each document is characterized by a set of key words • The union of the sets is the domain of discourse for the documents • Advantages: • simple • domain independent methods exist (can be automated) • good for organizing heterogeneous text • Disadvantages: • not appropriate for data • “this is about X” vs. this is not about X” • key words are not organized University of South Carolina

  22. Alta-Vista“Way-Cool Topic Graph” University of South Carolina

  23. Thesaurus • Organizes key words based on synonyms and antonyms • WordNet: (http://www.cogsci.princeton.edu/~wn/) groups words into synonym sets, and relates the sets via hypernymy/hyponymy, antonymy, entailment, and meronymy/holonymy University of South Carolina

  24. Taxonomies • A hierarchical organization of concepts, based on set-subset relationships. Biologists organize the plant and animal kingdoms using taxonomies University of South Carolina

  25. Ontologies • A semantic net (a generalization of a taxonomy, allowing other relationships than subset) consisting of types of entities, attributes and properties, relations and functions, and constraints hasPart Car Wheel (= #wheels 4) subclass Convertible University of South Carolina

  26. Ontology Development • Bottom-Up from Schemas and Key Words • identify databases • identify names for all tables, fields, and enumerated values (e.g., if value is limited to a primary color “red”, “green”, or “blue”) • form groups of common concepts and assign name to covering concept for each group • iterate; or Extensional View: form classes from instances University of South Carolina

  27. Ontology Development • Top-Down from First Principles (intensional view): a class is defined by a set of membership conditions or properties • Restrictions on Class Formation: • a class must have instances • a class must contain all properties common to the instances in its extension • classification should obey cognitive economy--instances of a class must share some, but not all properties • classification should enable inference of properties based on class membership University of South Carolina

  28. Ontology Development (cont.) • Restrictions on Class Structures: • Completeness--every property must be used in the definition of at least one class • Nonredundancy--a subclass must be defined by at least one property not in any of its superclasses (the result is that a subclass is always a specialization of any of its superclasses, i.e., it has more properties or restrictions, and has fewer instances) University of South Carolina

  29. Tools for Developing Ontologies • Ontolingua and Chimaera (Stanford) • SHOE: Simple HTML Ontology Extension language (U. of Maryland) • JOE: Java Ontology Editor (U. of South Carolina) • IMTS (MCC) • Cyc Unit Editor • UML, ER, and Conceptual Modeling Tools University of South Carolina

  30. Classification Is Difficult! From the ancient Chinese encyclopedia Celestial Emporium of Benevolent Knowledge, “It is written that animals are divided into • belonging to the emperor • embalmed • tame • sucking pigs • sirens • fabulous • stray dogs • included in the present classification • frenzied • innumerable • drawn with a very fine camel-hair brush • et cetera • having just broken the water pitcher, and • that from a long way off look like flies.”

  31. JOE (Editor Mode) University of South Carolina

  32. JOE (Query Mode) Partial Query University of South Carolina

  33. Future Applications • Information gathering, presentation, and management in large, heterogeneous, open environments: Internet and intranets • Energy distribution and management • Electronic commerce • Smart vehicles and smart highways • Inventory management and logistics • Smart houses and buildings • Active, distributed, and intelligent data dictionaries containing • constraints, and constraint enforcement • business rules, and rule processing • business processes, and process enactment • business semantics, and semantics resolution • Cooperative mobile sensing • Software engineering: Interaction-Oriented Programming • Distance learning University of South Carolina

  34. Logistics Domain Ontology name Army Brigade is-part-of is-part-of supports name isa maintains quantity Forward-Support-Battalion Military-Unit ress-code isa consist-of War-Reserves isa is-authorized-to class isa maintains Stock type Main-Support-Battalion Direct-Support-Unit name type quantity has-as-part stored-in Storage isa Stock-Item fsc-code name Mobile-Storage located-in name isa niin University of South Carolina Geographic-Area

  35. X3L8 Taxonomy University of South Carolina

  36. Topic Trees, Ontologies, and Database Schemas MiG29 Weapon price designer Number Person People Terms Air Sea expertIn Mikoyan r73 mig29 sirena Fighter Bomber speed weight ivan artem mikoyan Person DOB Specialty Fighter Speed Weight Price University of South Carolina

  37. Cyc THING COLLECTION INDIVIDUAL OBJECT SITUATION STUFF TYPE TANGIBLE INTANGIBLE OBJECT TYPE TEMPORAL OBJECT STATICSITUATION GROUP TEMPORALSTUFFTYPE TIME INTERVAL TEMPORALOBJECTYPE GROUP TIMEINTERVAL EXISTINGSTUFFTYPE EXISTINGOBJECTYPE EVENT SOMETHINGEXISTING CONFIGURATION FOODGROUPTYPE BIRTHEVENT HOLIDAY TEXTUALMATERIAL STATICSITUATION University of South Carolina

More Related