360 likes | 451 Views
OWL-based Semantic Conflicts Detection and Resolution for Data Interoperability. Changqing Li, Tok Wang Ling Department of Computer Science School of Computing National University of Singapore. Outline. Introduction Preliminary and motivation
E N D
OWL-based Semantic Conflicts Detection and Resolution for Data Interoperability Changqing Li, Tok Wang Ling Department of Computer Science School of Computing National University of Singapore
Outline • Introduction • Preliminary and motivation • OWL-based Semantic Conflicts Detection and Resolution • Conclusion • Q & A
Introduction • Data interoperability and integration is a long-standing challenge to the database research community. • Ontology provides sharing knowledge among different data sources • Clarify the semanticsof information. • Provide a way to solve the interoperabilityproblem in database integration
Introduction (Cont.) • OWL is being promoted as a standard for web ontology language • In the future a considerable number of ontologies will be created based on OWL. • Therefore automatically detecting semantic conflicts based on OWL will greatly expedite the step to achieve semantic interoperability, and will greatly reduce the manual work to detect semantic conflicts.
Ontology Definition • An ontology defines the basic terms and relationscomprising the vocabulary of a topic area, as well as the rules for combining terms and relations to define extensions to the vocabulary [1]. 1. Robert Neches, Richard Fikes, Timothy W. Finin, Thomas R. Gruber, Ramesh Patil, Ted E. Senator, William R. Swartout: Enabling Technology for Knowledge Sharing. AI Magazine 12(3): pp36-56 (1991)
Ontology Language • SHOE • RDF • RDFS • DAML+OIL • OWL
SHOE • The Simple HTML Ontological Extensions (SHOE) [2] extends HTML with machine-readable knowledge annotated. 2. Sean Luke and Jeff Heflin: SHOE Specification 1.01. http://www.cs.umd.edu/projects/plus/SHOE/spec.html
RDF • Resource Description Framework (RDF) [3] is a recommendation of W3C for Semantic Web [4] • It defines a simple model to describe relationships among resources in terms of properties and values. • SVO form (Subject-Verb-Object) • Resource-property-Value 3. Ora Lassila and Ralph R. Swick: Resource description framework (RDF). http://www.w3c.org/TR/WD-rdf-syntax 4. The SemanticWeb Homepage. http://www.semanticweb.org
RDFS • RDFSchema (RDFS) [5], the primitive description language of RDF • Provide some basic primitives • subClassOf • subPropertyOf • … 5. Dan Brickley and R.V. Guha. Resource Description Framework (RDF) Schema Specification 1.0, W3C Candidate Recommendation 27 March 2000. http://www.w3.org/TR/rdf-schema/
DAML+OIL • DARPA Agent Markup Language (DAML) [6] • To facilitate the semantic concepts and relationships understood by machines • Ontology Inference Layer (OIL) [7] • Extends RDFS with additional language primitives not yet presented in RDFS. • DAML+OIL [8] are the successors of RDFS • Combination of DAML and OIL • More semantic rich primitives are defined 6. The DARPA Agent Markup Language Homepage. http://daml.semanticweb.org/ 7. The Ontology Inference Layer OIL Homepage. http://www.ontoknowledge.org/oil/TR/oil.long.html 8. DAML+OIL Definition. http://www.daml.org/2001/03/daml+oil
OWL • DAML+OIL is evolving as OWL (Web Ontology Language) [9]. • OWL is almost the same as DAML+OIL • Some primitives of DAML+OIL are renamed in OWL for easier understanding. • e.g., “sameClassAs” is changed to “equivalentClass” • … 9. Frank van Harmelen, Jim Hendler, Ian Horrocks, Deborah L. McGuinness, Peter F. Patel-Schneider and Lynn Andrea Stein. OWL Web Ontology Language Reference. http://www.w3.org/TR/owl-ref/
Primitives of OWL • “owl” before “:” is the namespace • owl:equivalentClass • owl:euqivalentProperty • owl:sameIndividualAs • owl:disjointWith • owl:differentFrom • …
Our Extension of OWL (EOWL) • We extend OWL with the following primitives • eowl:orderingProperty • eowl:overlap • eowl:properSubClassOf • eowl:properSubPropertyOf • …
OWL-based Semantic Conflicts Cases A. Name conflicts B. Order sensitive conflicts C. Scaling conflicts D. Whole and part conflicts E. Partial similarity conflicts F. Swap conflicts
A. Name conflicts • Example A. two distributed data warehouses • one is used to analyze the United States market • country, state, city and district • and the other is used to analyze the China market • country, province, city and county • Based on the context • “provicnce” is defined equivalent to “State” using the OWL primitive “owl:equivalentClass”. • To resolve this conflict, one name needs to be changed. Change to the referenced name.
A. Name conflicts (Cont.) • “owl:equivalentClass” is the indicator to detect synonym conflicts • Change to “State” as which is referenced in the ontology definition. <owl:Classrdf:ID="Province"> <rdfs:label>Province</rdfs:label> <owl:equivalentClassrdf:resource="#State"/> </owl:Class> Fig. A. Detection of synonym conflicts
A. Name conflicts (Cont.) • Case A. Synonyms. The OWL primitives “owl:equivalentClass”, “owl:equivalentProperty” and “owl:sameInvidualAs” are indicators to detect this case. • Conflict Resolution Rule A. If synonym conflicts are detected, different attribute names with the same semantics need to be translated to the same name (referenced name) for smooth data interoperability.
B. Order sensitive conflicts • Example B. Consider the highest three scores of a course. • The highest three scores of course A are listed as “90, 95, 100” at ascendingorder, • The highest three scores of course B are listed as “98, 95, 93” at descending order. • The “highestThreeScores” is defined as an “eowl:orderingProperty” in the ontology • The sequences of the highest three scores for course A and B should be adjusted both to ascending order or descending order. • Adjust to the sequence of the first one by default, e.g. the sequence of course A
B. Order sensitive conflicts (Cont.) • We can further define the ascendant or descendant order for more precise semantics. <eowl:orderingProperty rdf:ID="highestThreeScores"> <rdfs:label>highest three scores of a course</rdfs:label> <rdfs:domain rdf:resource="#Course"/> <rdfs:range rdf:resource="xsd#integer"/> </eowl:orderingProperty> Fig. B. Detection of order sensitive conflicts
B. Order sensitive conflicts (Cont.) • Case B. Order sensitive. EOWL primitive “eowl:orderingProperty” and RDF primitive “rdf:Seq” are indicators to detect this case. • Conflict Resolution Rule B. If order sensitive conflicts are detected, we need to adjust the member sequence according to the same criterion for smooth data interoperability, the sequence of the first one by default.
C. Scaling conflicts • Example C. Consider two database schemas • Product(ID, Price) • Product(ID, Price) • One price may refer to the US dollars, while the other may refer to the Singapore dollars. Figure 4 shows some concepts about a currency ontology; “price” is defined • Translate the price to refer to the same currency unit. The unit of the first one by default.
C. Scaling conflicts (Cont.) <owl:DatatypeProperty rdf:ID="price"> <rdfs:domain rdf:resource="#Product"> <rdfs:range rdf:parseType="Resource"> <rdf:value/> <currency:CurrencyUnit/> </rdfs:range> </owl:DatatypeProperty> Fig. C. Detection of scaling conflicts
C. Scaling conflicts (Cont.) • Case C. Semantic conflicts may exist if the value of a data type property comprises both value and unit (Scaling). RDF primitive “rdf:parseType="Resource"” and OWL primitive “owl:DatatypeProperty” are indicators for this case. • Conflict Resolution Rule C. If scaling conflicts are detected, the value should be translated to refer to the same unit for smooth data interoperability. The first unit by default.
D. Whole and part conflicts • Example D. Consider schemas • Person(ID, name) • Person(ID, surname, givenName) • “surname” and “givenName” are both defined as the proper sub property of “name”; using “eowl:properSubClassOf” • “eowl:properSubClassOf” has clearer semantics than “rdfs:subClassOf” because “rdfs:subClassOf” is ambiguous with two meanings: “eowl:properSubClassOf”and “owl:equivalentClass”. • Divide the whole attribute “name” to the part attributes “surname” and “givenName” • Or combine the part attributes “surname” and “givenName” together in the correct sequence to form the whole attribute “name”.
D. Whole and part conflicts (Cont.) <rdf:Property rdf:ID="surname"> <eowl:properSubPropertyOf rdf:resource="#name"> </rdf:Property> Fig. D1. Detection of whole and part conflicts <rdf:Property rdf:ID=“givenname"> <eowl:properSubPropertyOf rdf:resource="#name"> </rdf:Property> Fig. D2. Detection of whole and part conflicts
D. Whole and part conflicts (Cont.) • Case D. Semantic conflicts may exist if one concept is completely contained in another concept (Whole and part). EOWL primitives “eowl:properSubClassOf”, “eowl:properSubPropertyOf” are indicators to detect this case. • Conflict Resolution Rule D. If whole and part conflicts are detected, the whole attributes should be divided into part attributes or the part attributes should be combined together to whole attributes for smooth data interoperability.
E. Partial similarity conflicts • Example E. integration ResearchAssistant and GraduateStudent • The relationship between research assistant and graduate student is overlap because some research assistants are also graduate students, • but not all research assistants are graduate students, • and not all graduate students are research assistants. • After integration, there should be three schemas: • Research Assistant but not Graduate Student RNotG • Graduate Student but not Research Assistant GNotR • both Research Assistant and Graduate Student RAndG
E. Partial similarity conflicts (Cont.) <owl:Class rdf:ID="ResearchAssistant"> <eowl:overlap rdf:resource="#GraduateStudent"/> </owl:Class> Fig. E. Detection of partial similarity conflicts
E. Partial similarity conflicts (Cont.) • Case E. Semantic conflicts may exist if two concepts are overlapped (Partial similarity). EOWL primitive “eowl:overlap” is indicators to detect this case. • Conflict Resolution Rule E. If partial similarity conflicts are detected, the overlap part should be separated before integration.
F. Swap conflicts • Example F. Continued from Example A • In China, county is contained in city (city has larger area) • In US, city is contained incounty (county has larger area). • The domain (“County”) of property “region:containedIn” in the China ontology is just the range of the same property “region:containedIn” in the US ontology • The range (“City”) of property “region:containedIn” in the China ontology is just the domain of the same property “region:containedIn” in the US ontology. • We can add “China.” or “US.” before “City” and “County” for smooth data interoperability.
F. Swap conflicts (Cont.) <owl:Class rdf:ID="County"> <region:containedIn rdf:resource="#City”/> </owl:Class> Fig. F1. Detection of swap conflicts (the relationship between city and county in the China ontology) <owl:Class rdf:ID="City"> <region:containedIn rdf:resource="#County”/> </owl:Class> Fig. F2. Detection of swap conflicts (the relationship between city and county in the US ontology)
F. Swap conflicts (Cont.) • Case F. Semantic conflicts may exist if the domain of a property in the first ontology is the range of the same property in the second ontology, and the range of the property in the first ontology is the domain of the same property in the second ontology (Swap). • Conflict Resolution Rule F. If swap conflicts are detected, context restrictions (see Example F) should be added to the schema for smooth data interoperability.
Conclusion • We extend OWL with several primitives which have clearer semantics • summarize several cases based on OWL in which semantic conflicts are easily to be encountered • The conflict resolution rules for each case are presented. • In the future, OWL will be frequently used to build ontologies, and this paper provides a computer-aid approach to detect and resolve semantic conflictsfor smooth data interoperability.
References 1. Robert Neches, Richard Fikes, Timothy W. Finin, Thomas R. Gruber, Ramesh Patil, Ted E. Senator, William R. Swartout: Enabling Technology for Knowledge Sharing. AI Magazine 12(3): pp36-56 (1991) 2. Sean Luke and Jeff Heflin: SHOE Specification 1.01. http://www.cs.umd.edu/projects/plus/SHOE/spec.html 3. Ora Lassila and Ralph R. Swick: Resource description framework (RDF). http://www.w3c.org/TR/WD-rdf-syntax 4. The SemanticWeb Homepage. http://www.semanticweb.org 5. Dan Brickley and R.V. Guha. Resource Description Framework (RDF) Schema Specification 1.0, W3C Candidate Recommendation 27 March 2000. http://www.w3.org/TR/rdf-schema/ 6. The DARPA Agent Markup Language Homepage. http://daml.semanticweb.org/ 7. The Ontology Inference Layer OIL Homepage. http://www.ontoknowledge.org/oil/TR/oil.long.html 8. DAML+OIL Definition. http://www.daml.org/2001/03/daml+oil 9. Frank van Harmelen, Jim Hendler, Ian Horrocks, Deborah L. McGuinness, Peter F. Patel-Schneider and Lynn Andrea Stein. OWL Web Ontology Language Reference. http://www.w3.org/TR/owl-ref/