640 likes | 651 Views
This introduction provides an overview of XML, RDF, OWL, and rules in the context of the Semantic Web. It covers topics like XML syntax, DTDs, XML Schema, and namespaces.
E N D
Introduction to Semantic WebDr. Bhavani Thuraisingham The University of Texas at DallasOctober 2010
Outline • XML • RDF • OWL • RULES • Reference: G. Antoniou and F. vanHarmelen, A Semantic Web Primer, MIT Press, 2004 (second edition, 2008)
XML The XML Language An XML document consists of • a prolog • a number of elements • an optional epilog The prolog consists of • an XML declaration and • an optional reference to external structuring documents <?xml version="1.0" encoding="UTF-16"?> <!DOCTYPE book SYSTEM "book.dtd">
XML Elements • Content may be text, or other elements, or nothing <lecturer> <name>David Billington</name> <phone> +61 − 7 − 3875 507 </phone> </lecturer> • If there is no content, then the element is called empty; it is abbreviated as follows: <lecturer/> for <lecturer></lecturer>
XML Attributes • An empty element is not necessarily meaningless • It may have some properties in terms of attributes • An attribute is a name-value pair inside the opening tag of an element <lecturer name="David Billington" phone="+61 − 7 − 3875 507"/>
Well-Formed XML Documents • Syntactically correct documents • Some syntactic rules: • Only one outermost element (called root element) • Each element contains an opening and a corresponding closing tag • Tags may not overlap • <author><name>Lee Hong</author></name> • Attributes within an element have unique names • Element and tag names must be permissible • An XML document is valid if • it is well-formed • respects the structuring information it uses • There are two ways of defining the structure of XML documents: • DTDs (the older and more restricted way) • XML Schema (offers extended possibilities)
The Tree Model of XML Documents: An Example <email> <head> <from name="Michael Maher" address="michaelmaher@cs.gu.edu.au"/> <to name="Grigoris Antoniou" address="grigoris@cs.unibremen.de"/> <subject>Where is your draft?</subject> </head> <body> Grigoris, where is the draft of the paper you promised me last week? </body> </email>
DTD: Element Type Definition <lecturer> <name>David Billington</name> <phone> +61 − 7 − 3875 507 </phone> </lecturer> DTD for above element (and all lecturer elements): <!ELEMENT lecturer (name,phone)> <!ELEMENT name (#PCDATA)> <!ELEMENT phone (#PCDATA)> • The element types lecturer, name, and phone may be used in the document • A lecturer element contains a name element and a phone element, in that order (sequence) • A name element and a phone element may have any content • In DTDs, #PCDATA is the only atomic type for elements
XML Schema • Significantly richer language for defining the structure of XML documents • Tts syntax is based on XML itself • not necessary to write separate tools • Reuse and refinement of schemas • Expand or delete already existent schemas • Sophisticated set of data types, compared to DTDs(which only supports strings) • An XML schema is an element with an opening tag like <schema "http://www.w3.org/2000/10/XMLSchema" version="1.0"> • Structure of schema elements • Element and attribute typesusing data types
Data Types • There is a variety of built-in data types • Numerical data types: integer, Short etc. • String types: string, ID, IDREF, CDATA etc. • Date and time data types: time, Month etc. • There are also user-defined data types • simple data types, which cannot use elements or attributes • complex data types, which can use these • Complex data types are defined from already existing data types by defining some attributes (if any) and using: • sequence, a sequence of existing data type elements (order is important) • all, a collection of elements that must appear (order is not important) • choice, a collection of elements, of which one will be chosen
A Data Type Example <complexType name="lecturerType"> <sequence> <element name="firstname" type="string" minOccurs="0“ maxOccurs="unbounded"/> <element name="lastname" type="string"/> </sequence> <attribute name="title" type="string" use="optional"/> </complexType>
XML Schema: The Email Example <element name="email" type="emailType"/> <complexType name="emailType"> <sequence> <element name="head" type="headType"/> <element name="body" type="bodyType"/> </sequence> </complexType>
Namespaces • An XML document may use more than one DTD or schema • Since each structuring document was developed independently, name clashes may appear • The solution is to use a different prefix for each DTD or schema • prefix:name
An Example <vu:instructors xmlns:vu="http://www.vu.com/empDTD" xmlns:gu="http://www.gu.au/empDTD" xmlns:uky="http://www.uky.edu/empDTD"> <uky:faculty uky:title="assistant professor" uky:name="John Smith" uky:department="Computer Science"/> <gu:academicStaff gu:title="lecturer" gu:name="Mate Jones" gu:school="Information Technology"/> </vu:instructors>
Addressing and Querying XML Documents: XPATH • In relational databases, parts of a database can be selected and retrieved using SQL • Same necessary for XML documents • Query languages: XQuery, XQL, XML-QL • The central concept of XML query languages is a path expression • Specifies how a node or a set of nodes, in the tree representation of the XML document can be reached • XPath is core for XML query languages • Language for addressing parts of an XML document. • It operates on the tree data model of XML • It has a non-XML syntax
XSL Transformations (XSLT) • XSLT specifies rules with which an input XML document is transformed to • another XML document, an HTML document , plain text • The output document may use the same DTD or schema, or a completely different vocabulary • XSLT can be used independently of the formatting language <author> <name>Grigoris Antoniou</name> <affiliation>University of Bremen</affiliation> <email>ga@tzi.de</email> </author> may be displayed in different ways: • Grigoris Antoniou Grigoris Antoniou • University of Bremen University of Bremen • ga@tzi.dega@tzi.de
Summary • XML is a metalanguage that allows users to define markup • XML separates content and structure from formatting • XML is the de facto standard for the representation and exchange of structured information on the Web • XML is supported by query languages
RDF Drawbacks of XML • XML is a universal metalanguage for defining markup • It provides a uniform frramework for interchange of data and metadata between applications • However, XML does not provide any means of talking about the semantics (meaning) of data • E.g., there is no intended meaning associated with the nesting of tags • It is up to each application to interpret the nesting.
Basic Ideas of RDF • Basic building block: object-attribute-value triple • It is called a statement • Sentence about Billingtonis such a statement • RDF has been given a syntax in XML • This syntax inherits the benefits of XML • Other syntactic representations of RDF possible • The fundamental concepts of RDF are: • resources • properties • statements
Resources • We can think of a resource as an object, a “thing” we want to talk about • E.g. authors, books, publishers, places, people, hotels • Every resource has a URI, a Universal Resource Identifier • A URI can be • a URL (Web address) or • some other kind of unique identifier
Properties • Properties are a special kind of resources • They describe relations between resources • E.g. “written by”, “age”, “title”, etc. • Properties are also identified by URIs • Advantages of using URIs: • Α global, worldwide, unique naming scheme • Reduces the homonym problem of distributed data representation
Statements • Statements assert the properties of resources • A statement is an object-attribute-value triple • It consists of a resource, a property, and a value • Values can be resources or literals • Literals are atomic values (strings) • Three views of a Statement • A triple • A piece of a graph • A piece of XML code Thus an RDF document can be viewed as: • A set of triples • A graph (semantic net) • An XML document
Statements as Triples (http://www.cit.gu.edu.au/~db, http://www.mydomain.org/site-owner, #David Billington) • The triple (x,P,y) can be considered as a logical formula P(x,y) • Binary predicate P relates object x to object y • RDF offers only binary predicates (properties) A Set of Triples as a Semantic Net
RDF Statements in XML • An RDF document is represented by an XML element with the tag rdf:RDF • The content of this element is a number of descriptions, which use rdf:Description tags. • Every description makes a statement about a resource, identified in 3 ways: • an about attribute, referencing an existing resource • an ID attribute, creating a new resource • without a name, creating an anonymous resource <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:mydomain="http://www.mydomain.org/my-rdf-ns"> <rdf:Description rdf:about="http://www.cit.gu.edu.au/~db"> <mydomain:site-owner rdf:resource=“#David Billington“/> </rdf:Description> </rdf:RDF>
Reification • In RDF it is possible to make statements about statements • Grigoris believes that David Billington is the creator of http://www.cit.gu.edu.au/~db • Such statements can be used to describe belief or trust in other statements • The solution is to assign a unique identifier to each statement • It can be used to refer to the statement • Introduce an auxiliary object (e.g. belief1) • relate it to each of the 3 parts of the original statement through the properties subject, predicate and object • In the preceding example • subject of belief1 is David Billington • predicate of belief1 is creator • object of belief1 is http://www.cit.gu.edu.au/~db
The rdf:resource Attribute <rdf:Description rdf:about="CIT1111"> <uni:courseName>Discrete Mathematics</uni:courseName> <uni:isTaughtBy rdf:resource="949318"/> </rdf:Description> <rdf:Description rdf:about="949318"> <uni:name>David Billington</uni:name> <uni:title>Associate Professor</uni:title> </rdf:Description>
Container Elements • Collect a number of resources or attributes about which we want to make statements as a whole • E.g., we may wish to talk about the courses given by a particular lecturer • The content of container elements are named rdf:_1, rdf:_2, etc. • Alternatively rdf:li • Three Types of Container Elements • rdf:Bag an unordered container, allowing multiple occurrences • E.g. members of the faculty board, documents in a folder • rdf:Seq an ordered container, which may contain multiple occurrences • E.g. modules of a course, items on an agenda, an alphabetized list of staff members (order is imposed) • rdf:Alt a set of alternatives • E.g. the document home and mirrors, translations of a document in various languages
A Semantic Web Primer\ Example for a Bag <uni:lecturer rdf:ID="949352" uni:name="Grigoris Antoniou" uni:title="Professor"> <uni:coursesTaught> <rdf:Bag> <rdf:_1 rdf:resource="#CIT1112"/> <rdf:_2 rdf:resource="#CIT3116"/> </rdf:Bag> </uni:coursesTaught> </uni:lecturer>
A Semantic Web Primer Example for Alternative <uni:course rdf:ID="CIT1111" uni:courseName="Discrete Mathematics"> <uni:lecturer> <rdf:Alt> <rdf:li rdf:resource="#949352"/> <rdf:li rdf:resource="#949318"/> </rdf:Alt> </uni:lecturer> </uni:course>
RDF Collections • A limitation of these containers is that there is no way to close them • “these are all the members of the container” • RDF provides support for describing groups containing only the specified members, in the form of RDF collections • list structure in the RDF graph • constructed using a predefined collection vocabulary: rdf:List, rdf:first, rdf:rest and rdf:nil • Shorthand syntax: • "Collection" value for the rdf:parseType attribute: <rdf:Description rdf:about="#CIT2112"> <uni:isTaughtBy rdf:parseType="Collection"> <rdf:Description rdf:about="#949111"/> <rdf:Description rdf:about="#949352"/> <rdf:Description rdf:about="#949318"/> </uni:isTaughtBy> </rdf:Description>
Basic Ideas of RDF Schema • RDF is a universal language that lets users describe resources in their own vocabularies • RDF does not assume, nor does it define semantics of any particular application domain • The user can do so in RDF Schema using: • Classes and Properties • Class Hierarchies and Inheritance • Property Hierarchies
A Semantic Web Primer Classes and their Instances • We must distinguish between • Concrete “things” (individual objects) in the domain: Discrete Maths, David Billington etc. • Sets of individuals sharing properties called classes: lecturers, students, courses etc. • Individual objects that belong to a class are referred to as instances of that class • The relationship between instances and classes in RDF is through rdf:type
A Semantic Web Primer Inheritance in Class Hierarchies • Range restriction: Courses must be taught by academic staff members only • Michael Maher is a professor • He inheritsthe ability to teach from the class of academic staff members • This is done in RDF Schema by fixing the semantics of “is a subclass of” • It is not up to an application (RDF processing software) to interpret “is a subclass of
A Semantic Web Primer Property Hierarchies • Hierarchical relationships for properties • E.g., “is taught by” is a subproperty of “involves” • If a course C is taught by an academic staff member A, then C also involves Α • The converse is not necessarily true • E.g., A may be the teacher of the course C, or • a tutor who marks student homework but does not teach C • P is a subproperty of Q, if Q(x,y) is true whenever P(x,y) is true
A Semantic Web Primer RDF Schema in RDF • The modeling primitives of RDF Schema are defined using resources and properties (RDF itself is used!) • To declare that “lecturer” is a subclass of “academic staff member” • Define resources lecturer, academicStaffMember, and subClassOf • define property subClassOf • Write triple (lecturer,subClassOf,academicStaffMember) • We use the XML-based syntax of RDF
Core Classes • rdfs:Resource, the class of all resources • rdfs:Class, the class of all classes • rdfs:Literal, the class of all literals (strings) • rdf:Property, the class of all properties. • rdf:Statement, the class of all reified statements • Example <rdfs:Classrdf:about="#lecturer"> <rdfs:subClassOfrdf:resource="#staffMember"/> </rdfs:Class>
Core Properties • rdf:type, which relates a resource to its class • The resource is declared to be an instance of that class • rdfs:subClassOf, which relates a class to one of its superclasses • All instances of a class are instances of its superclass • rdfs:subPropertyOf, relates a property to one of its superproperties • rdfs:domain, which specifies the domain of a property P • The class of those resources that may appear as subjects in a triple with predicate P • If the domain is not specified, then any resource can be the subject • rdfs:range, which specifies the range of a property P • The class of those resources that may appear as values in a triple with predicate P <rdf:Propertyrdf:ID="phone"> <rdfs:domainrdf:resource="#staffMember"/> <rdfs:rangerdf:resource="http://www.w3.org/ 2000/01/rdf-schema#Literal"/> </rdf:Property>
Semantics based on Inference Rules • Semantics in terms of RDF triples instead of restating RDF in terms of first-order logic with a sound and complete inference systems • This inference system consists of inference rules of the form: IF E contains certain triples THEN add to E certain additional triples where E is an arbitrary set of RDF triples • Examples IF E contains the triple (?x,?p,?y) THEN E also contains (?p,rdf:type,rdf:property) IF E contains the triples (?u,rdfs:subClassOf,?v) and (?v,rdfs:subclassOf,?w) THEN E also contains the triple (?u,rdfs:subClassOf,?w) IF E contains the triples (?x,rdf:type,?u) and (?u,rdfs:subClassOf,?v) THEN E also contains the triple (?x,rdf:type,?v) • Any resource ?y which appears as the value of a property ?p can be inferred to be a member of the range of ?p • This shows that range definitions in RDF Schema are not used to restrict the range of a property, but rather to infer the membership of the range IF E contains the triples (?x,?p,?y) and (?p,rdfs:range,?u) THEN E also contains the triple (?y,rdf:type,?u)
SPARQL RDF Query Language • SPARQL is based on matching graph patterns • The simplest graph pattern is the triple pattern : • like an RDF triple, but with the possibility of a variable instead of an RDF term in the subject, predicate, or object positions • Combining triple patterns gives a basic graph pattern, where an exact match to a graph is needed to fulfill a pattern
Summary • RDF provides a foundation for representing and processing metadata • RDF has a graph-based data model • RDF has an XML-based syntax to support syntactic interoperability • XML and RDF complement each other because RDF supports semantic interoperability • RDF has a decentralized philosophy and allows incremental building of knowledge, and its sharing and reuse • RDF is domain-independent - RDF Schema provides a mechanism for describing specific domains • RDF Schema is a primitive ontology language • It offers certain modelling primitives with fixed meaning • Key concepts of RDF Schema are class, subclass relations, property, subproperty relations, and domain and range restrictions • There exist query languages for RDF and RDFS, including SPARQL
OWL Requirements for Ontology Languages • Ontology languages allow users to write explicit, formal conceptualizations of domain models • The main requirements are: • a well-defined syntax • efficient reasoning support • a formal semantics • sufficient expressive power • convenience of expression
Reasoning About Knowledge in Ontology Languages • Class membership • If x is an instance of a class C, and C is a subclass of D, then we can infer that x is an instance of D • Equivalence of classes • If class A is equivalent to class B, and class B is equivalent to class C, then A is equivalent to C, too • Consistency • X instance of classes A and B, but A and B are disjoint • This is an indication of an error in the ontology • Classification • Certain property-value pairs are a sufficient condition for membership in a class A; if an individual x satisfies such conditions, we can conclude that x must be an instance of A • Reasoning Support for OWL • Semantics is a prerequisite for reasoning support • Formal semantics and reasoning support are usually provided by mapping an ontology language to a known logical formalism; using automated reasoners that already exist for those formalisms • OWL is (partially) mapped on a description logic, and makes use of reasoners such as FaCT and RACER • Description logics are a subset of predicate logic for which efficient reasoning support is possible
Reasoning Support for OWL • Semantics is a prerequisite for reasoning support • Formal semantics and reasoning support are usually provided by • mapping an ontology language to a known logical formalism • using automated reasoners that already exist for those formalisms • OWL is (partially) mapped on a description logic, and makes use of reasoners such as FaCT and RACER • Description logics are a subset of predicate logic for which efficient reasoning support is possible
Some Limitations of the Expressive Power of RDF Schema • Local scope of properties • rdfs:range defines the range of a property (e.g. eats) for all classes • In RDF Schema we cannot declare range restrictions that apply to some classes only • E.g. we cannot say that cows eat only plants, while other animals may eat meat, too • Disjointness of classes • Sometimes we wish to say that classes are disjoint (e.g. male and female) • Boolean combinations of classes • Sometimes we wish to build new classes by combining other classes using union, intersection, and complement • E.g. person is the disjoint union of the classes male and female • Cardinality restrictions • E.g. a person has exactly two parents, a course is taught by at least one lecturer • Special characteristics of properties • Transitive property (like “greater than”) • Unique property (like “is mother of”) • A property is the inverse of another property (like “eats” and “is eaten by”)
Three Species of OWL • W3C’sWeb Ontology Working Group defined OWL as three different sublanguages: • OWL Full • OWL DL • OWL Lite • Each sublanguage geared toward fulfilling different aspects of requirements • OWL uses XML and RDF for syntax • OWL is based on Description Logic • Description Logic is a fragment of first-order logic • OWL inherits from Description Logic • The open-world assumption • The non-unique-name assumption
OWL • OWL Full • It uses all the OWL languages primitives and allows the combination of these primitives in arbitrary ways with RDF and RDF Schema • OWL Full is fully upward-compatible with RDF, both syntactically and semantically • OWL Full is so powerful that it is undecidable; No complete (or efficient) reasoning support • OWL DL (Description Logic) • It is a sublanguage of OWL Full that restricts application of the constructors from OWL and RDF • Application of OWL’s constructors’ to each other is disallowed; Therefore it corresponds to a well studied description logic • OWL DL permits efficient reasoning support; But we lose full compatibility with RDF • Not every RDF document is a legal OWL DL document. Every legal OWL DL document is a legal RDF document. • OWL-Lite: • Even further restriction limits OWL DL to a subset of the language constructors • OWL Lite excludes enumerated classes, disjointness statements, and arbitrary cardinality. • The advantage of this is a language that is easier to grasp (for users, for tool builders) The disadvantage is restricted expressivity
owl:Ontology <owl:Ontology rdf:about=""> <rdfs:comment>An example OWL ontology </rdfs:comment> <owl:priorVersion rdf:resource="http://www.mydomain.org/uni-ns-old"/> <owl:imports rdf:resource="http://www.mydomain.org/persons"/> <rdfs:label>University Ontology</rdfs:label> </owl:Ontology> • owl:imports is a transitive property
Classes • Classes are defined using owl:Class • owl:Class is a subclass of rdfs:Class • Disjointness is defined using owl:disjointWith <owl:Class rdf:about="#associateProfessor"> <owl:disjointWith rdf:resource="#professor"/> <owl:disjointWith rdf:resource="#assistantProfessor"/> </owl:Class> • owl:equivalentClass defines equivalence of classes <owl:Class rdf:ID="faculty"> <owl:equivalentClass rdf:resource= "#academicStaffMember"/> </owl:Class> • owl:Thing is the most general class, which contains everything • owl:Nothing is the empty class
Properties • In OWL there are two kinds of properties • Object properties, which relate objects to other objects • E.g. is-TaughtBy, supervises • Data type properties, which relate objects to datatype values • E.g. phone, title, age, etc. • Data Type Properties: OWL makes use of XML Schema data types, using the layered architecture of the SW <owl:DatatypeProperty rdf:ID="age"> <rdfs:range rdf:resource= "http://www.w3.org/2001/XLMSchema #nonNegativeInteger"/> </owl:DatatypeProperty> • User-defined data types: Object Properties <owl:ObjectProperty rdf:ID="isTaughtBy"> <owl:domain rdf:resource="#course"/> <owl:range rdf:resource= "#academicStaffMember"/> <rdfs:subPropertyOf rdf:resource="#involves"/> </owl:ObjectProperty>