821 likes | 1k Views
Ontology Storage, Reasoning and Query ---- Methods, Systems and Applications. 马 力 MALLI@cn.ibm.com IBM China Research Lab. Outline. Introduction Ontology Management Ontology based Data Management Ontology Storage, Reasoning and Query Triple Store Reasoning SPARQL Query Language
E N D
Ontology Storage, Reasoning and Query ---- Methods, Systems and Applications 马 力 MALLI@cn.ibm.com IBM China Research Lab
Outline • Introduction • Ontology Management • Ontology based Data Management • Ontology Storage, Reasoning and Query • Triple Store • Reasoning • SPARQL Query Language • Faceted Search • Systems and Applications • Sesame, Jena, OWLIM, SOR, • Master Data Management
Objectives • Understand issues in ontology management • Understand the use of ontology for Data Management • Learn core technical pillars for Semantic Web. • Learn systems and methods for building Semantic Web applications.
What Ontology Is? Recall what we learned before. Ontology • defines the terms and concepts used to describe and represent an area of knowledge • specification of the conceptualization; a formal way of writing down what we think about a domain.
Examples of Ontology • In the Extended Lehigh University Benchmark Ontology (UOBM, 1587 lines) : . . . . . . <owl:Class rdf:about="http://uob.iodt.ibm.com/univ-bench-dl.owl#AssistantProfessor"> <rdfs:subClassOf> <owl:Class rdf:about=“http://uob.iodt.ibm.com/univ-bench-dl.owl#Professor”/> </rdfs:subClassOf> <rdfs:label>assistant professor</rdfs:label> </owl:Class> . . . . . .
Ontology: one possible formal definition • An ontology is a 3-tupel O := ( C, R, A ) • C is the set of concepts C={c1, c2, … cn} • R is the set of relations R={h,r1, r2, … rn} • R≤2CxC • h: concept hierarchy • A is the set of axioms A={a1, a2, …, an} • axiom is expressed in some logical language • e.g. father(a,b) ^ father(a,c) b=c • Optionally, with a symbol mapping function
Ontology Reuse • Ontologies • UMLS: Unified Medical Language System • 178,904 frames, 7 relations, 1,729,817 assertions • Word Net: an online lexical reference system • 217,623 frames, 17 relations, 385,771 assertions • IFW/FS-BOM: Financial Service – Business Object Model • 387 classes, 5878 relations • Caper: an ontology used by a case-based AI planning • 116,297 frames, 112 relations, 1,768,832 assertions • GO (Gene Ontology) : An Ontology for gene terms • 18059 classes, 1 relation • NCI Thesaurus : An Ontology by National Cancer Institute • 72,603 classes, 70 relations • Cyc: a formalized representation of fundamental human knowledge • 2,843 classes, 1242 relations • UNSPSC:Universal Standard Products and Services Classification Code • 9795 classes, 1 relation
Why Ontology Management Ontologies evolve/change over time because of: • changes in the real-world (or changes in the domain) • adaptations to different tasks (or changes in conceptualization) , or • alignments to other ontologies (or changes in specification) Solution • A change management methodology is needed that involves • advanced versioning methods • configuration management An ontology management system will facilitate ontology re-use by: • open storage, identification and versioning. • providing smooth access to existing ontologies and advanced support in adapting ontologies to certain domain and task-specific circumstances. • fully employing the power of standardization.
Questions in Ontology Management • how to align different domain descriptions • how to handle changes over time • how to maintain versions of ontologies • how to store ontologies • how to identify and retrieve ontologies • ???… etc.
Issues in Ontology Management • Storage • accessibility (client/server, Peer-to-Peer, etc.); • classification (Classifying ontologies in order to reorganizing and reuse ontologies) • module structure (facilitate the process of re-use, mapping and integration). • Identification • unique identifier • Versioning • Versioning is very critical in ensuring the consistency among different versions of ontologies. • Search and Query • keyword-based searching or other advanced searching • browsing • Editing • Remote and cooperative editing • Reasoning (derive consequences from an ontology) • ontology evaluation and verification • Any query-answering behavior • Alignment • Ontologies can be integrated or separated. In both cases, they need to be aligned.
Alignment and Mapping • Why: • Because different departments and individual employees create domain-specific ontologies capturing specific aspects of their knowledge. • How: • Special mapping ontologies must be created to link different terminologies and modeling styles used in these domain specific ontologies, creating bridges between separated pieces of knowledge. • These bridges along with domain ontologies are then used to perform cross-ontology information search and retrieval. Three types of mapping: • Inter-model mapping: mapping the ontology language constructs for ontology translation • Inter-schema mapping: defining the relation between ontology elements for data translation • Model-to-schema mapping: combining the above two
Outline • Introduction • Ontology Management • Ontology for Data Management • Ontology Storage, Reasoning and Query • Triple Store • Reasoning on Large-Scale Data • SPARQL Query Language • Faceted Search • Systems and Applications • Sesame, Jena, OWLIM, SOR, • Master Data Management
Semantic Data Management • Mapping existing data to Ontology - Adapting SQL (or XML) Databases
Semantics Data Management (different levels of conceptual granularity and term variants) Company info. Company business Query 1. Find Company EDOX’s all direct and indirect shareholders who are from Europe and are IT company. 2. A software company that has products about wireless telecom and is held by a Canada company FOO should be returned BAR should be returned Semantic Data Management
ontology Shareholder Business Hold Company … Finance IT … Banking Telecom Located in Conduct PC Optical Wireless … Hardware Solution Region Business Software Wireless Software Main board … Memory Region Asia Euro. Amer. … … East Asia France North Amer. Paris China USA Canada BeiJing NY Vancouver Semantics Data Management (different levels of conceptual granularity and term variants) Company info. Company business Ontology based semantic query 1. Find Company EDOX’s all direct and indirect shareholders who are from Europe and are IT company. 2. A software company that has products about wireless telecom and is held by a Canada company FOO is retrieved using transitive closure and subsumption inference. BAR is retrieved using classification and subsumption inference Semantic Data Management
Semantics Data Management (hidden linkages) Query Which genes may be affected by drug OLANZAPINE (G1 and MAFD1 GENE) should be returned Semantic Data Management
Treat Disease Drug Associated with Reduce HasSym Symptom Gene Semantics Data Management (hidden linkages) Ontology based semantic query Which genes may be affected by drug OLANZAPINE (G1 and MAFD1 GENE) is retrieved with Drug-Symptom-Disease-Gene affection path Semantic Data Management
Semantic Data Management • Key Motivations • Reduce complexity and simplify integration for IT systems and applications by the effective use of ontologies (metadata), and improve understanding by the use of shared business vocabulary (one language). • Implicit linkages are made explicit and help user discover hidden relationship • Ontologies (domain metadata) are separately and effectively managed, avoiding to mix with data, for better ontology share/reuse and rapid semantic-rich application development • Capabilities • Management of Data Concepts and Schemas (Metadata) • Storing and querying ontology (shared vocabularies of business concepts, metadata) • Mapping database schemas to the ontology to formally capture the semantics of corporate data • Semantic Data Validation • Using description logic reasoner to validate enterprise ontology • Using inference rules to validate integrity of the data based on a set of restrictions. The inference rules will automatically identify inconsistencies when querying for information. • Semantic Query • Semantic relationships and taxonomies for Text / Content analysis & search • Ontology based query on the existing data
RDF and OWL ontology repositories • Problem definition • The continued rapid growth of ontologies in various domains critically requires efficient methods and tools for its storage and inference. • Develop high performance ontology repositories applicable in real business. • How to solve • Build storage model on well-optimized RDBMS • Provide powerful inference support (subset of OWL-DL) • Support expressive query language, SPARQL • Rich full text search capability, such as Faceted Search • Results • RDF and OWL ontology inference method • Ontology storage tool • RDF ontology repository • OWL ontology repository
Outline • Introduction • Ontology Management • Ontology for Data Management • Ontology Storage, Reasoning and Query • Triple Store • Reasoning on Large-Scale Data • SPARQL Query Language • Faceted Search • Systems and Applications • Sesame, Jena, OWLIM, DLDB-OWL, SOR, • Master Data Management
Triple Store: RDF Data Model • Data model for expressing knowledge • basic building block: statement <person001> <name> “Jeen” . • groups of statements form graphs name Jeen person001 email j.broekstra@tue.nl worksIn projectMemberEmail name project001 SOR
Triple Store: RDFS rdfs:Class • RDF Schema is a Vocabulary Description Language • it allows specification of domain vocabulary and a way to structure it • Class, Property, subClassOf, subPropertyOf, domain, range • Formal semantics add simple reasoning capabilities: • class and property subsumption • domain and range inference rdf:type rdf:Property Person rdf:type rdfs:domain rdfs:subClassOf name Researcher rdf:type person001
Triple Store: OWL <?xml version="1.0" encoding="UTF-8" ?> <rdf:RDF xml:base = "http://www.ibm.com/crl#" xmlns:rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#"> <owl:Ontology rdf:about=""> <rdfs:comment>An example to show differences between owl-lite, DLP and owl-DL</rdfs:comment> </owl:Ontology> <owl:Class rdf:about=#Faculty> <owl:unionOf rdf:parseType="Collection"> <owl:Class rdf:about="#Professor" /> <owl:Class rdf:about="#Post-Doc" /> <owl:Class rdf:about="#Lecturet" /> </owl:unionOf> </owl:Class> <owl:Class rdf:ID=“Ph.D Student"> <rdfs:subClassOf> <owl:intersectionOf rdf:parseType="Collection"> <owl:Class rdf:about="#Person" /> <owl:Restriction> <owl:onProperty rdf:resource="#take" /> <owl:someValuesFrom> <owl:Class rdf:about="#Ph.D course" /> </owl:someValuesFrom> </owl:Restriction> </owl:intersectionOf> </rdfs:subClassOf> </owl:Class> Faculty OWL:UnionOf Professor Post-Doc Lecture Ph.D Student rdfs:subClass Blank node OWL:intersectionOf Person
Anonymous class OWL:complementOf Ph.D student <owl:Class> <owl:complementOf> <owl:Class rdf:ID="#Ph.D Student"/> </owl:complementOf> </owl:Class> <owl:Class rdf:about=“Major”> <owl:oneOf rdf:parseType="Collection"> <owl:Thing rdf:about="#CS"/> <owl:Thing rdf:about="#EE"/> <owl:Thing rdf:about="#Physics"/> </owl:oneOf> </owl:Class> <owl:class rdf:ID=“FrenchCitizens”> <owl:equivalentClass> <owl:restriction> <owl:onProperty rdf:resource="#hasNationality" /> <owl:hasValue rdf:resource="#France" /> </owl:restriction> </owl:equivalentClass> </owl:class> <owl:class rdf:ID=“multiple nationality citizens” > <owl:equivalentClass> <owl:restriction> <owl:onProperty rdf:resource="#hasNationality" /> <owl:minCardinality rdf:datatype="xsd;nonNegativeInteger">2</owl:minCardinality> </owl:restriction> </owl:equivalentClass> </owl:class> </rdf:RDF> Major OWL:oneOf …… CS EE FrenchStudent hasNationality France
Triple Store: Problems • Do not leverage patterns in data • Can not leverage locality (spatial/temporal) • Excessive load time (can not use db loader) • Database optimizer useless – no statistics (?var, ex:empId, 123) vs. (?var, ex:gender, “M”) • Alternatives: native RDF store, object-relational store, property tables
Triple Store: Binary Storage Model Too many tables?
Triple Store: Native Storage Model • OWLIM: Persistence based on N-Triple files • HStar: Persistence based on XML storage model
Summary and Questions • Categories of Triple Stores: • Generic store with very simple schema • Binary store leveraging OR/OO DB’s features • Native store without complicated transaction and access control • We will introduce optimizations of triple store in next class • Jena’s property table • Index mechanisms • Optimization on triple table • Binary store vs Generic Store
OWL DL OWL Lite Description Horn Logic RDFS (DL) Ontologies in Semantic Web • Shallow ontologies include relatively simple Tbox and are mainly used to organize instances of a huge size. • Deep ontologies consist of complex concepts and relations and are often used to classify complex sets of properties as certain sorts of object. Nigel Shadbolt, Tim Berners-Lee and Wendy Hall, The Semantic Web Revisited, IEEE Intelligent Systems 21(3) pp. 96-101, 2006 22% 32% Not measured in paper 46% Statistics based on a WWW 2006 paper by T.D. Wang. Survey of 1,211 ontologies.
Ontology Reasoning • Existing ontology persistent systems can be roughly categorized into two classes by reasoning: • DL-based systems: DB serves mainly for scalable storage and convenient retrieval, and classic DL tableaux algorithms for reasoning. Query answering is reduced to check the satisfiability of KB. • Instances Store: Role-free system, on instances classification. • IBM SHER Engine: Use summarization and filtering technologies. • Rule-based systems: Translate DL constructs into rules. Those DL constructs (e.g. existential restrictions) are either partially forbidden (as DLP does) or assigned new meanings (as OWL Flight does). Unlike DL tableaux algorithms, the evaluation of queries adopts strategies by forward chaining or backward chaining. • KAON2: Reduce OWL to disjunctive Datalog Programs, extending with DL safe rules. • OWLIM: Materialize inferred closure of OWL KBs. • Sesame RDF database: Materialize inferred results in database. • Jena2 with RDBMS support: Use external reasoning engine in main memory. • Oracle RDF database: Support user-defined rules and materialize rule index.
Ontology Reasoning • Description Logic Program • An intersection of Description Logics and Logic Programming, and can be implemented by a set of rules. • Some DL constructs (e.g. existential restrictions) are partially forbidden in subsumption axioms • These systems are in essence knowledge bases using databases as a persistent store, and focus on ontology reasoning problem. Discrepancies between ontologies and databases are paid less attention.
Discrepancies Between Ontologies and Databases • Ontologies and Description Logic (OWL DL) • Open World Assumption (Allow incomplete info. in ABox) • Restrictions for reasoning • takeCourse rdf:domain People, John takeCourse English001 • Monotonic negation • Reasoning in OWL DL is NExpTime-complete • TBox reasoning can be well done • ABox reasoning is not scalable • Databases • Closed World Assumption (Info. understood as complete) • Constraints for checking • Non-monotonic negation • Industry strength tools People takeCourse
Ontology Reasoning • We will introduce more details in next class • Forward chaining • Backward chaining • Ways to scale up classic DL reasoner • Ways to bridge discrepancies between ontologies and databases
Semantic Web Query and Search • Keywords search (SWOOGLE) • SPARQL query • Faceted browsing • Visualization
Query Semantic Web • Data Access • Information organisation • Information format • Identification • Serialization • SPARQL • OWL, RDFS • RDF • URIs • XML
SPARQL SPARQL = Query Language + Protocol + XML Results Format • Access and query RDF graphs • HTTP and SOAP • Results: fixed XML form for further transformation • Product of the RDF Data Access Working Group • Status: W3C Candidate Recommendation
SPARQL Query PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT ?title2 WHERE { ?doc dc:title "SPARQL at speed" . ?doc dc:creator ?c . ?docOther dc:creator ?c . ?docOther dc:title ?title2 } • On a papers database:“Find other papers by the authors of a given paper.”
SPARQL Query PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX shop: <http://example/shop#> SELECT ?title2 WHERE { ?doc dc:title ?title . FILTER regex(?title, "SPARQL") . ?doc dc:creator ?c . ?c foaf:name ?name . OPTIONAL { ?doc shop:price ?price } } • “Find books with ‘SPARQL’ in the title. Get the authors’ name and the price (if available).” • Multiple vocabularies
SPARQL Query and Inference An RDF graph may be backed by inference • OWL, RDFS, application, rules :x rdf:type :C .:C rdfs:subClassOf :D . PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT ?type WHERE { ?x rdf:type ?type . } -------- | type | ======== | :C | | :D | --------
SPARQL : Data Virtualization • SPARQL as integrator • Data remains where it is • Existing applications untouched • data appears as RDF, remap query to native form • SPARQL to SQL • Direct mapping of tables • Semi-automatic generation of mapping • Modelled: D2RQ • High-quality mapping, manually developed
SPARQL Query SPARQL Query SPARQL Query SPARQL Query Federated Query: Single Point of Access • Inputs: • Service Description • Information Directory • Request • Outputs: • Unified results Query BrokerSPARQL => SPARQL RDF DocDB CorpLDAP
?person <customer> ?person ORPattern Vars: ?person, ?age md:workedOn md:buyProducts md:age ?person andPattern: P1 Vars: ?person, ?product ?product ?person ?department <customer> ?age FILTER (?age > 50) md:age P1: andPattern md:belongTo andPattern: P2 Vars: ?person, ?product, ?department md:buyProducts md:producedBy ?age FILTER (?age <35) P4: OPPattern OPPattern: P3 Vars: ?person, ?age ?product P3: OPPattern OPPattern: P4 Vars: ?person, ?age P2: andPattern SPARQL Query Processing • Find those who worked on a product bought by a specific customer, return their name and their age if they are younger than 35; and those whose departement sold a product to a specific customer and whose age is older than 50 • PREFIX md: <http://crl.ibm.com/MDM#> • SELECT ?person ?age • WHERE { <customer> md:buyProducts ?product . ?person md:workOn ?product. OPTIONAL { ?person md:age ?age. FILTER (?age < 35) } } • UNION • { <customer> md:buyProducts ?product . ?product md:producedBy ?department . ?person md:belongTo ?department. OPTIONAL { ?person md:age ?age. FILTER (?age > 50) } } Query Pattern Tree
Evaluation of Different Access Methods • Compared to zero-effort interfaces (keyword search) • Gave simple tasks (e.g. find all blue-eyed terrorists) • Results: • higher solution rate, preferred interface • complex queries difficult for people • ranking not intuitive