740 likes | 1.02k Views
XML, RDF, and OWL. The Derivation of Web Ontology Language. Acknowledgments. This presentation uses several researchers’ previous examples
E N D
XML, RDF, and OWL The Derivation of Web Ontology Language
Acknowledgments • This presentation uses several researchers’ previous examples • Special thanks to Roger L. Costello and David B. Jacobs in MITRE Corporation, Hamish Cunningham and Kalina Bontcheva in University of Sheffield, David De Roure in GGF Semantic Grid Research Group, and one anonymous researcher who provides excellent explanation of RDF syntax.
The Holy Grail Hamish Cunningham and Kalina Bontcheva, Ontology-Aware Information Extraction, 2002
course title teacher students name http XML: document = labeled tree <course date=“...”><title>...</title><teacher>...</teacher> <name>...</name> <http>...</http><students>...</students></course> = • XML Schema: grammars for describing legal trees and datatypes • Why not use XML to represent semantics?
Syntax and Semantics • Syntax: structure of the data • Semantics: meaning of the data • Two conditions necessary for interoperability: • Adopt a common syntax: this enables applications to parse the data. • Adopt a means for understanding the semantics: this enables applications to use the data.
Can XML represent semantics? <title> … </title> • title: a heading that names a statute or legislative bill. • title: the name of a work of art or literary composition etc. • title: a general or descriptive heading for a section of a written work. • title: the status of being a champion. • title: a legal document signed and sealed and delivered to effect a transfer of property and to show the legal right to possess it • … (from WordNet)
XML: limitations for semantic markup • XML makes no commitment on: Domain-specific ontological vocabulary Ontological modeling primitives • Requires pre-arranged agreement on & • Only feasible for closed collaboration • agents in a small & stable community • pages on a small & stable intranet • Not suited for sharing Web-resources
What is the purpose of RDF? • The purpose of RDF (Resource Description Framework) is to give a standard way of specifying data "about" something. • Here's an example of an XML document that specifies data about China's Yangtze river: <?xml version="1.0"?> <River id="Yangtze" xmlns="http://www.geodesy.org/river"> <length>6300 kilometers</length> <startingLocation>western China's Qinghai-Tibet Plateau</startingLocation> <endingLocation>East China Sea</endingLocation> </River> "Here is data about the Yangtze River. It has a length of 6300 kilometers. Its startingLocation is western China's Qinghai-Tibet Plateau. Its endingLocation is the East China Sea."
<?xml version="1.0"?> <River id="Yangtze" xmlns="http://www.geodesy.org/river"> <length>6300 kilometers</length> <startingLocation>western China's Qinghai-Tibet Plateau</startingLocation> <endingLocation>East China Sea</endingLocation> </River> XML Yangtze.xml "convert to" <?xml version="1.0"?> <River rdf:ID="Yangtze" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.geodesy.org/river#"> <length>6300 kilometers</length> <startingLocation>western China's Qinghai-Tibet Plateau</startingLocation> <endingLocation>East China Sea</endingLocation> </River> RDF Yangtze.rdf From XML to RDF
Internationalized Resource Identifier (IRI) RDF provides an ID attribute for identifying the resource being described. 1 The ID attribute is in the RDF namespace. 2 <?xml version="1.0"?> <River rdf:ID="Yangtze" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.geodesy.org/river#"> <length>6300 kilometers</length> <startingLocation>western China's Qinghai-Tibet Plateau</startingLocation> <endingLocation>East China Sea</endingLocation> </River> 3 Add the "fragment identifier symbol" to the namespace.
Namespaces • Newest version: W3C Recommendation in February 4th, 2004 (Namespaces in XML 1.1) • A simple method for qualifying element and attribute names used in XML documents • Identified by IRI references
http://www.w3.org/1999/02/22-rdf-syntax-ns# ID about type resource Description RDF Namespace
RDF Framework Model RDF Description Resource IRI Property Property Type Value
The RDF Format <?xml version="1.0"?> <Classrdf:ID="Resource" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="uri"> <property>value</property> <property>value</property> ... </Class>
2 Identifies the resource being described. This resource is an instance of River. Identifies the type (class) of the resource being described. 1 <?xml version="1.0"?> <River rdf:ID="Yangtze" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.geodesy.org/river#"> <length>6300 kilometers</length> <startingLocation>western China's Qinghai-Tibet Plateau</startingLocation> <endingLocation>East China Sea</endingLocation> </River> These are properties, or attributes, of the type (class). 3 Values of the properties 4 More Interpretation
Uniquely Identify the Resource • RDF is very concerned about uniquely identifying the type (class) and the properties. RDF is also very concerned about uniquely identifying the resource, e.g., This is the resource being described. We want to uniquely identify this resource. <?xml version="1.0"?> <River rdf:ID="Yangtze" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.geodesy.org/river#"> <length>6300 kilometers</length> <startingLocation>western China's Qinghai-Tibet Plateau</startingLocation> <endingLocation>East China Sea</endingLocation> </River>
rdf:ID • The value of rdf:ID is a "relative URI". • The "complete URI" is obtained by concatenating the URL of the XML document with "#" and then the value of rdf:ID, e.g., <?xml version="1.0"?> <River rdf:ID="Yangtze" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.geodesy.org/river#"> <length>6300 kilometers</length> <startingLocation>western China's Qinghai-Tibet Plateau</startingLocation> <endingLocation>East China Sea</endingLocation> </River> Yangtze.rdf Suppose that this RDF/XML document is located at this URL: http://www.china.org/geography/rivers. Thus, the complete URI for this resource is: http://www.china.org/geography/rivers#Yangtze
xml:base • By default, the URL of the document provided the base URI. • Depending on the location of the document is brittle: it will break if the document is moved, or is copied to another location. • A more robust solution is to specify the base URI in the document, e.g., <?xml version="1.0"?> <River rdf:ID="Yangtze" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.geodesy.org/river#" xml:base="http://www.china.org/geography/rivers"> <length>6300 kilometers</length> <startingLocation>western China's Qinghai-Tibet Plateau</startingLocation> <endingLocation>East China Sea</endingLocation> </River> Resource URI = concatenation(xml:base, '#', rdf:ID) = concatenation(http://www.china.org/geography/rivers, '#', "Yangtze") = http://www.china.org/geography/rivers#Yangtze
rdf:about • Instead of identifying a resource with a relative URI (which then requires a base URI to be prepended), we can give the complete identity of a resource. However, we use rdf:about, rather than rdf:ID, e.g., <?xml version="1.0"?> <River rdf:about="http://www.china.org/geography/rivers#Yangtze" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.geodesy.org/river#"> <length>6300 kilometers</length> <startingLocation>western China's Qinghai-Tibet Plateau</startingLocation> <endingLocation>East China Sea</endingLocation> </River>
rdf:Description + rdf:type • There is another way of representing the XML. This way makes it very clear that you are describing something, and it makes it very clear what the type (class) is of the thing you are describing: <?xml version="1.0"?> <rdf:Description rdf:about="http://www.china.org/geography/rivers#Yangtze" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.geodesy.org/river#"> <rdf:type rdf:resource="http://www.geodesy.org/river#River"/> <length>6300 kilometers</length> <startingLocation>western China's Qinghai-Tibet Plateau</startingLocation> <endingLocation>East China Sea</endingLocation> </rdf:Description>
value resource property RDF Triple Model • RDF “statements” consist of resources (= nodes)which have propertieswhich have values (= nodes,strings) = subject= predicate= object “http://www.china.org/geography/rivers#Yangtze has a http://www.geodesy.org/river#length of 6300 kilometers” http://www.china.org/geography/rivers#Yangtze http://www.geodesy.org/river#length “6300 kilometers”
RDF Graph Model “East China Sea” http://www.geodesy.org/river#endingLocation http://www.china.org/geography/rivers#Yangtze http://www.geodesy.org/river#length “6300 Kilometers” http://www.geodesy.org/river#startingLocation “western China's Qinghai-Tibet Plateau”
<?xml version="1.0"?> <River rdf:about="http://www.china.org/geography/rivers#Yangtze" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.geodesy.org/river#"> <length>6300 kilometers</length> <startingLocation>western China's Qinghai-Tibet Plateau</startingLocation> <endingLocation>East China Sea</endingLocation> </River> uppercase lowercase Naming Convention • The convention is to use a capital letter to start a type (class) name, and use a lowercase letter to start a property name. • This helps the eye quickly discern the striping pattern.
http://www.china.org/geography/rivers#Yangtze …:location “East China Sea” …:ending …:starting “western China's Qinghai-Tibet Plateau” Complex Values • RDF/XML can also represent graphs that include nodes that have no IRIrefs, i.e., the blank nodes, syntactically, values can be embedded (i.e. lexically in-line) or referenced (linked)
http://www.china.org/geography/rivers#Yangtze …:location “East China Sea” …:ending …:starting “western China's Qinghai-Tibet Plateau” Complex Values (RDF code) <?xml version="1.0"?> <rdf:Description rdf:about="http://www.china.org/geography/rivers#Yangtze" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.geodesy.org/river#"> <rdf:type rdf:resource="http://www.geodesy.org/river#River"/> </rdf:Description> <rdf:Description rdf:about="http://www.china.org/geography/rivers#Yangtze"> <location rdf:nodeID="abc"/> </rdf:Description> <rdf:Description rdf:nodeID="abc"> <starting>western China's Qinghai-Tibet Plateau</starting> <ending>East China Sea</ending> </rdf:Description>
rdf:ID versus rdf:about • When should rdf:ID be used? When should rdf:about be used? • When you want to introduce a resource, and provide an initial set of information about a resource use rdf:ID • When you want to extend the information about a resource use rdf:about • The RDF philosophy is akin to the Web philosophy. That is, anyone, anywhere, anytime can provide information about a resource.
RDF Description Resource 1 Resource 2 Resource 3 PropertyType1 PropertyType3 PropertyType2 PropertyType4 “Atomic Value” “Atomic Value”
RDF Parser • There is a nice RDF validation Web services at the W3C Web site, which will tell you if your XML is in the proper RDF format. http://www.w3.org/RDF/Validator/
Notes of using the RDF Format • Constrained: the RDF format constrains you on how you design your XML (i.e., you can't design your XML in any arbitrary fashion). • RDF uses namespaces to uniquely identify types (classes), properties, and resources. Thus, you must have a solid understanding of namespaces. • Another XML vocabulary to learn: to use the RDF format you must learn the RDF vocabulary.
Two Main Areas of RDF RDF Schema RDF Syntax RDF XML
RDF Schema (RDFS) • Defines small vocabulary for RDF: • Class, subClassOf, type • Property, subPropertyOf • domain, range • Vocabulary can be used to define other vocabularies for your application domain • The benefit of an RDFS is that it facilitates inferences on your data, and enhanced searching. Person subClassOf subClassOf range domain Student Researcher HasSupervisor type type Frank Jeen hasSuperVisor
NaturallyOccurringWaterSource BodyOfWater Stream Brook Tributary River Ocean Lake Sea Properties: length: Literal emptiesInto: BodyOfWater Inference Engine Rivulet <?xml version="1.0"?> <River rdf:ID="Yangtze" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.geodesy.org/water/naturally-occurring#"> <length>6300 kilometers</length> <emptiesInto rdf:resource="http://www.china.org/geography#EastChinaSea"/> </River> Yangtze.rdf Inferences: - Yangtze is a Stream - Yangtze is an NaturallyOcurringWaterSource - http://www.china.org/geography#EastChinaSea is a BodyOfWater
NaturallyOccurringWaterSource BodyOfWater Stream Brook Tributary River Ocean Lake Sea Properties: length: Literal emptiesInto: BodyOfWater Search Engine Rivulet "Show me all documents that contain info about Streams" <?xml version="1.0"?> <River rdf:ID="Yangtze" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.geodesy.org/water/naturally-occurring#"> <length>6300 kilometers</length> <emptiesInto rdf:resource="http://www.china.org/geography#EastChinaSea"/> </River> Yangtze.rdf Results: - Yangtze is a Stream, so this document is relevant to the query.
RDF Schemas is all about defining taxonomies (class hierarchies) All classes and properties are defined within rdf:RDF 1 <?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xml:base="http://www.geodesy.org/water/naturally-occurring"> <rdfs:Class rdf:ID="River"> <rdfs:subClassOf rdf:resource="#Stream"/> </rdfs:Class> <rdfs:Class rdf:ID="Stream"> <rdfs:subClassOf rdf:resource="#NaturallyOccurringWaterSource"/> </rdfs:Class> ... </rdf:RDF> Assigns a namespace to the taxonomy! 2 Defines the River class 3 Since the Stream class is defined in the same document we can reference it using a fragment identifier. 5 Defines the Stream class 4 NaturallyOccurringWaterSource.rdfs (snippet) This is read as: "I hereby define a River Class. River is a subClassOf Stream." "I hereby define a Stream Class. Stream is a subClassOf NaturallyOccurringWaterSource." ...
Name of the class <rdfs:Class rdf:ID="River"> <rdfs:subClassOf rdf:resource="#Stream"/> </rdfs:Class> ANDed rdfs:Class • This type is used to define a class. • The rdf:ID provides a name for the class. • The contents are used to indicate the members of the class. • The contents are ANDed together.
rdfs:subClassOf Stream River This represents the set of Streams, i.e., the set of instances of type Stream. This represents the set of Rivers, i.e., the set of instances of type River.
Multiple rdfs:subClassOf Properties <rdfs:Class rdf:ID="River"> <rdfs:subClassOf rdf:resource="#Stream"/> <rdfs:subClassOf rdf:resource="http://www.containers.org#SedimentContainer"/> </rdfs:Class> SedimentContainer Stream - a River is both a Stream and a SedimentContainer. River The conjunction (AND) of two subClassOf statements is a subset of the intersection of the classes.
This type is used to define a property. • The rdf:ID provides a name for the property. • The contents are used to indicate the usage of the property. • The contents are ANDed together. Name of the property <rdf:Property rdf:ID="emptiesInto"> <rdfs:domain rdf:resource="#River"/> <rdfs:range rdf:resource="#BodyOfWater"/> </rdf:Property> ANDed rdf:Property
<rdf:Property rdf:ID="emptiesInto"> <rdfs:domain rdf:resource="#River"/> <rdfs:range rdf:resource="#BodyOfWater"/> <rdfs:range rdf:resource="http://www.geodesy.org/coast#CoastalWater"/> </rdf:Property> CoastalWater BodyOfWater - the value of emptiesInto is a BodyOfWater and a CoastalWater. range Example of multiple rdfs:range
<rdf:Property rdf:ID="emptiesInto"> <rdfs:domain rdf:resource="#River"/> <rdfs:domain rdf:resource="http://www.containers.org#Vessel"/> <rdfs:range rdf:resource="#BodyOfWater"/> </rdf:Property> River Vessel - emptiesInto is to be used in instances that are of type River and Vessel. domain Example of multiple rdfs:domain
Class and Property: different namespaces • Class is in the rdfs namespace. • Property is in the rdf namespace.
Properties are defined separately from classes • RDF Schema approach is to define a class, and then separately define properties and state that they are to be used with the class. • The advantage of this approach is that anyone, anywhere, anytime can create a property and state that it is usable with the class!
Problems • Equivalent classes • Cardinality constraints • More … • no precisely described meaning • no inference model
Beyond RDF: & • OIL(Ontology Inference Layer) extends RDF Schema to a fully-fledged knowledge representation language. • logical expressions • data-typing • cardinality • quantifiers • http://www.ontoknowledge.org • DAML(DARPA Agent Markup Language) = US sister of OIL • Merged as DAML+OIL in 2001 • Becomes OWLW3C Recommendation in February 10th, 2004
Web Languages • RDF/S • XML DAML-ONT DAML+OIL (OWL) OIL Formal Foundations Description Logics Frame Systems FACT, CLASSIC, DLP, … DARPA’s DAML/ W3C’s OWL Language
OWL OWL Web Ontology Language
OWL cannot be a simple semantic extension of RDF/S • Relationship between layers • Syntactically no restriction • Semantically preserve meanings • Russell’s paradox • A very large collection of built-in sets • These built-in sets include the set consisting of those sets do not contain themselves • Is this set a member of itself? • Yes? It contains itself, so no • No? It do not contain itself, so yes • Violate the very principle of set theory: set membership should be a well-defined relationship
OWL cannot be a simple semantic extension of RDF/S • If OWL layered on top of RDF/S as a same-syntax extension • There has to bee a large collection of built-in classes in any model • When we want to make logical foundations of classes in the extension work correctly • This collection includes the class that is defined as those resources that do not belong to the class • Russell’s paradox • RDF/S does not fall into this paradox because it does not need a large collection of built-in classes • RDFS theory of classes and properties is very weak • Not possible to give class a formula or determine which resources belong to him • OWL is designed to allow for defined classes and more relationships between classes • This richer theory clashes the underlying principle of RDF/S
OWL Extends RDF • RDF-schema • Class, subclass • Property, subproperty + Restrictions • Range, domain • Local, global • Existential • Cardinality + Combinators Union, Intersection Complement Symmetric, transitive + Mapping Equivalence Inverse