680 likes | 690 Views
This introduction provides an overview of the Semantic Web, including its technologies and applications in knowledge management, e-commerce, and more.
E N D
Introduction to Semantic WebDr. Bhavani Thuraisingham The University of Texas at DallasOctober 2012
Outline • Introduction • XML • RDF • OWL • RULES • Reference: G. Antoniou and F. vanHarmelen, A Semantic Web Primer, MIT Press, 2004 (second edition, 2008)
From Today’s Web to Semantic web • Today’s web • High recall, low precision: Too many web pages resulting in searches, many not relevant • Sometimes low recall • Results sensitive to vocabulary: Different words even if they mean the same thing do not results in same web pages • Results are single web pages not linked web pages • Semantic web • Machine understandable web pages • Activities on the web such as searching with little or no human intervention • Technologies for knowledge management, e-commerce, interoperability • Solutions to the problems faced by today’s web
Applications • Knowledge Management • Corporation Need: Searching, extracting and maintaining information, uncovering hidden dependencies, viewing information • Semantic web for knowledge management: Organizing knowledge, automated tools for maintaining knowledge, question answering, querying multiple documents, controlling access to documents • Personal Agent • John is a president of a company. He needs to have a surgery for a serious but not a critical illness. With current web he has to check each web page for relevant information, make decisions depending on the information provided • With the semantic web, the agent will retrieve all the relevant information, synthesize the information, ask John if needed, and then present the various options and makes recommendations
Applications: E-Commerce • Business to Consumer • Users shopping on the web; wrapper technology is used to extract information about user preferences etc. and display the products to the user • Use of semantic web: Develop software agents that can interpret privacy requirements, pricing and product information and display timely and correct information to the use; also provides information about the reputation of shops • Business to Business • Organizations work together and carrying out transactions such as collaborating on a product, supply chains etc. With today’s web lack of standards for data exchange • Use of semantic web: XML is a big improvement, but need to agree on vocabulary. Future will be the use of ontologies to agree on meanings and interpretations
XML The XML Language An XML document consists of • a prolog • a number of elements • an optional epilog The prolog consists of • an XML declaration and • an optional reference to external structuring documents <?xml version="1.0" encoding="UTF-16"?> <!DOCTYPE book SYSTEM "book.dtd">
XML Elements • Content may be text, or other elements, or nothing <lecturer> <name>David Billington</name> <phone> +61 − 7 − 3875 507 </phone> </lecturer> • If there is no content, then the element is called empty; it is abbreviated as follows: <lecturer/> for <lecturer></lecturer>
XML Attributes • An empty element is not necessarily meaningless • It may have some properties in terms of attributes • An attribute is a name-value pair inside the opening tag of an element <lecturer name="David Billington" phone="+61 − 7 − 3875 507"/>
Well-Formed XML Documents • Syntactically correct documents • Some syntactic rules: • Only one outermost element (called root element) • Each element contains an opening and a corresponding closing tag • Tags may not overlap • <author><name>Lee Hong</author></name> • Attributes within an element have unique names • Element and tag names must be permissible • An XML document is valid if • it is well-formed • respects the structuring information it uses • There are two ways of defining the structure of XML documents: • DTDs (the older and more restricted way) • XML Schema (offers extended possibilities)
The Tree Model of XML Documents: An Example <email> <head> <from name="Michael Maher" address="michaelmaher@cs.gu.edu.au"/> <to name="Grigoris Antoniou" address="grigoris@cs.unibremen.de"/> <subject>Where is your draft?</subject> </head> <body> Grigoris, where is the draft of the paper you promised me last week? </body> </email>
DTD: Element Type Definition <lecturer> <name>David Billington</name> <phone> +61 − 7 − 3875 507 </phone> </lecturer> DTD for above element (and all lecturer elements): <!ELEMENT lecturer (name,phone)> <!ELEMENT name (#PCDATA)> <!ELEMENT phone (#PCDATA)> • The element types lecturer, name, and phone may be used in the document • A lecturer element contains a name element and a phone element, in that order (sequence) • A name element and a phone element may have any content • In DTDs, #PCDATA is the only atomic type for elements
XML Schema • Significantly richer language for defining the structure of XML documents • Tts syntax is based on XML itself • not necessary to write separate tools • Reuse and refinement of schemas • Expand or delete already existent schemas • Sophisticated set of data types, compared to DTDs(which only supports strings) • An XML schema is an element with an opening tag like <schema "http://www.w3.org/2000/10/XMLSchema" version="1.0"> • Structure of schema elements • Element and attribute typesusing data types
Data Types • There is a variety of built-in data types • Numerical data types: integer, Short etc. • String types: string, ID, IDREF, CDATA etc. • Date and time data types: time, Month etc. • There are also user-defined data types • simple data types, which cannot use elements or attributes • complex data types, which can use these • Complex data types are defined from already existing data types by defining some attributes (if any) and using: • sequence, a sequence of existing data type elements (order is important) • all, a collection of elements that must appear (order is not important) • choice, a collection of elements, of which one will be chosen
A Data Type Example <complexType name="lecturerType"> <sequence> <element name="firstname" type="string" minOccurs="0“ maxOccurs="unbounded"/> <element name="lastname" type="string"/> </sequence> <attribute name="title" type="string" use="optional"/> </complexType>
XML Schema: The Email Example <element name="email" type="emailType"/> <complexType name="emailType"> <sequence> <element name="head" type="headType"/> <element name="body" type="bodyType"/> </sequence> </complexType>
Namespaces • An XML document may use more than one DTD or schema • Since each structuring document was developed independently, name clashes may appear • The solution is to use a different prefix for each DTD or schema • prefix:name
An Example <vu:instructors xmlns:vu="http://www.vu.com/empDTD" xmlns:gu="http://www.gu.au/empDTD" xmlns:uky="http://www.uky.edu/empDTD"> <uky:faculty uky:title="assistant professor" uky:name="John Smith" uky:department="Computer Science"/> <gu:academicStaff gu:title="lecturer" gu:name="Mate Jones" gu:school="Information Technology"/> </vu:instructors>
Addressing and Querying XML Documents: XPATH • In relational databases, parts of a database can be selected and retrieved using SQL • Same necessary for XML documents • Query languages: XQuery, XQL, XML-QL • The central concept of XML query languages is a path expression • Specifies how a node or a set of nodes, in the tree representation of the XML document can be reached • XPath is core for XML query languages • Language for addressing parts of an XML document. • It operates on the tree data model of XML • It has a non-XML syntax
XSL Transformations (XSLT) • XSLT specifies rules with which an input XML document is transformed to • another XML document, an HTML document , plain text • The output document may use the same DTD or schema, or a completely different vocabulary • XSLT can be used independently of the formatting language <author> <name>Grigoris Antoniou</name> <affiliation>University of Bremen</affiliation> <email>ga@tzi.de</email> </author> may be displayed in different ways: • Grigoris Antoniou Grigoris Antoniou • University of Bremen University of Bremen • ga@tzi.dega@tzi.de
Summary • XML is a metalanguage that allows users to define markup • XML separates content and structure from formatting • XML is the de facto standard for the representation and exchange of structured information on the Web • XML is supported by query languages
RDF Drawbacks of XML • XML is a universal metalanguage for defining markup • It provides a uniform frramework for interchange of data and metadata between applications • However, XML does not provide any means of talking about the semantics (meaning) of data • E.g., there is no intended meaning associated with the nesting of tags • It is up to each application to interpret the nesting.
Basic Ideas of RDF • Basic building block: object-attribute-value triple • It is called a statement • Sentence about Billingtonis such a statement • RDF has been given a syntax in XML • This syntax inherits the benefits of XML • Other syntactic representations of RDF possible • The fundamental concepts of RDF are: • resources • properties • statements
Resources • We can think of a resource as an object, a “thing” we want to talk about • E.g. authors, books, publishers, places, people, hotels • Every resource has a URI, a Universal Resource Identifier • A URI can be • a URL (Web address) or • some other kind of unique identifier
Properties • Properties are a special kind of resources • They describe relations between resources • E.g. “written by”, “age”, “title”, etc. • Properties are also identified by URIs • Advantages of using URIs: • Α global, worldwide, unique naming scheme • Reduces the homonym problem of distributed data representation
Statements • Statements assert the properties of resources • A statement is an object-attribute-value triple • It consists of a resource, a property, and a value • Values can be resources or literals • Literals are atomic values (strings) • Three views of a Statement • A triple • A piece of a graph • A piece of XML code Thus an RDF document can be viewed as: • A set of triples • A graph (semantic net) • An XML document
Statements as Triples (http://www.cit.gu.edu.au/~db, http://www.mydomain.org/site-owner, #David Billington) • The triple (x,P,y) can be considered as a logical formula P(x,y) • Binary predicate P relates object x to object y • RDF offers only binary predicates (properties) A Set of Triples as a Semantic Net
RDF Statements in XML • An RDF document is represented by an XML element with the tag rdf:RDF • The content of this element is a number of descriptions, which use rdf:Description tags. • Every description makes a statement about a resource, identified in 3 ways: • an about attribute, referencing an existing resource • an ID attribute, creating a new resource • without a name, creating an anonymous resource <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:mydomain="http://www.mydomain.org/my-rdf-ns"> <rdf:Description rdf:about="http://www.cit.gu.edu.au/~db"> <mydomain:site-owner rdf:resource=“#David Billington“/> </rdf:Description> </rdf:RDF>
Reification • In RDF it is possible to make statements about statements • Grigoris believes that David Billington is the creator of http://www.cit.gu.edu.au/~db • Such statements can be used to describe belief or trust in other statements • The solution is to assign a unique identifier to each statement • It can be used to refer to the statement • Introduce an auxiliary object (e.g. belief1) • relate it to each of the 3 parts of the original statement through the properties subject, predicate and object • In the preceding example • subject of belief1 is David Billington • predicate of belief1 is creator • object of belief1 is http://www.cit.gu.edu.au/~db
The rdf:resource Attribute <rdf:Description rdf:about="CIT1111"> <uni:courseName>Discrete Mathematics</uni:courseName> <uni:isTaughtBy rdf:resource="949318"/> </rdf:Description> <rdf:Description rdf:about="949318"> <uni:name>David Billington</uni:name> <uni:title>Associate Professor</uni:title> </rdf:Description>
Container Elements • Collect a number of resources or attributes about which we want to make statements as a whole • E.g., we may wish to talk about the courses given by a particular lecturer • The content of container elements are named rdf:_1, rdf:_2, etc. • Alternatively rdf:li • Three Types of Container Elements • rdf:Bag an unordered container, allowing multiple occurrences • E.g. members of the faculty board, documents in a folder • rdf:Seq an ordered container, which may contain multiple occurrences • E.g. modules of a course, items on an agenda, an alphabetized list of staff members (order is imposed) • rdf:Alt a set of alternatives • E.g. the document home and mirrors, translations of a document in various languages
A Semantic Web Primer\ Example for a Bag <uni:lecturer rdf:ID="949352" uni:name="Grigoris Antoniou" uni:title="Professor"> <uni:coursesTaught> <rdf:Bag> <rdf:_1 rdf:resource="#CIT1112"/> <rdf:_2 rdf:resource="#CIT3116"/> </rdf:Bag> </uni:coursesTaught> </uni:lecturer>
A Semantic Web Primer Example for Alternative <uni:course rdf:ID="CIT1111" uni:courseName="Discrete Mathematics"> <uni:lecturer> <rdf:Alt> <rdf:li rdf:resource="#949352"/> <rdf:li rdf:resource="#949318"/> </rdf:Alt> </uni:lecturer> </uni:course>
RDF Collections • A limitation of these containers is that there is no way to close them • “these are all the members of the container” • RDF provides support for describing groups containing only the specified members, in the form of RDF collections • list structure in the RDF graph • constructed using a predefined collection vocabulary: rdf:List, rdf:first, rdf:rest and rdf:nil • Shorthand syntax: • "Collection" value for the rdf:parseType attribute: <rdf:Description rdf:about="#CIT2112"> <uni:isTaughtBy rdf:parseType="Collection"> <rdf:Description rdf:about="#949111"/> <rdf:Description rdf:about="#949352"/> <rdf:Description rdf:about="#949318"/> </uni:isTaughtBy> </rdf:Description>
Basic Ideas of RDF Schema • RDF is a universal language that lets users describe resources in their own vocabularies • RDF does not assume, nor does it define semantics of any particular application domain • The user can do so in RDF Schema using: • Classes and Properties • Class Hierarchies and Inheritance • Property Hierarchies
A Semantic Web Primer Classes and their Instances • We must distinguish between • Concrete “things” (individual objects) in the domain: Discrete Maths, David Billington etc. • Sets of individuals sharing properties called classes: lecturers, students, courses etc. • Individual objects that belong to a class are referred to as instances of that class • The relationship between instances and classes in RDF is through rdf:type
A Semantic Web Primer Inheritance in Class Hierarchies • Range restriction: Courses must be taught by academic staff members only • Michael Maher is a professor • He inheritsthe ability to teach from the class of academic staff members • This is done in RDF Schema by fixing the semantics of “is a subclass of” • It is not up to an application (RDF processing software) to interpret “is a subclass of
A Semantic Web Primer Property Hierarchies • Hierarchical relationships for properties • E.g., “is taught by” is a subproperty of “involves” • If a course C is taught by an academic staff member A, then C also involves Α • The converse is not necessarily true • E.g., A may be the teacher of the course C, or • a tutor who marks student homework but does not teach C • P is a subproperty of Q, if Q(x,y) is true whenever P(x,y) is true
A Semantic Web Primer RDF Schema in RDF • The modeling primitives of RDF Schema are defined using resources and properties (RDF itself is used!) • To declare that “lecturer” is a subclass of “academic staff member” • Define resources lecturer, academicStaffMember, and subClassOf • define property subClassOf • Write triple (lecturer,subClassOf,academicStaffMember) • We use the XML-based syntax of RDF
Core Classes • rdfs:Resource, the class of all resources • rdfs:Class, the class of all classes • rdfs:Literal, the class of all literals (strings) • rdf:Property, the class of all properties. • rdf:Statement, the class of all reified statements • Example <rdfs:Classrdf:about="#lecturer"> <rdfs:subClassOfrdf:resource="#staffMember"/> </rdfs:Class>
Core Properties • rdf:type, which relates a resource to its class • The resource is declared to be an instance of that class • rdfs:subClassOf, which relates a class to one of its superclasses • All instances of a class are instances of its superclass • rdfs:subPropertyOf, relates a property to one of its superproperties • rdfs:domain, which specifies the domain of a property P • The class of those resources that may appear as subjects in a triple with predicate P • If the domain is not specified, then any resource can be the subject • rdfs:range, which specifies the range of a property P • The class of those resources that may appear as values in a triple with predicate P <rdf:Propertyrdf:ID="phone"> <rdfs:domainrdf:resource="#staffMember"/> <rdfs:rangerdf:resource="http://www.w3.org/ 2000/01/rdf-schema#Literal"/> </rdf:Property>
Semantics based on Inference Rules • Semantics in terms of RDF triples instead of restating RDF in terms of first-order logic with a sound and complete inference systems • This inference system consists of inference rules of the form: IF E contains certain triples THEN add to E certain additional triples where E is an arbitrary set of RDF triples • Examples IF E contains the triple (?x,?p,?y) THEN E also contains (?p,rdf:type,rdf:property) IF E contains the triples (?u,rdfs:subClassOf,?v) and (?v,rdfs:subclassOf,?w) THEN E also contains the triple (?u,rdfs:subClassOf,?w) IF E contains the triples (?x,rdf:type,?u) and (?u,rdfs:subClassOf,?v) THEN E also contains the triple (?x,rdf:type,?v) • Any resource ?y which appears as the value of a property ?p can be inferred to be a member of the range of ?p • This shows that range definitions in RDF Schema are not used to restrict the range of a property, but rather to infer the membership of the range IF E contains the triples (?x,?p,?y) and (?p,rdfs:range,?u) THEN E also contains the triple (?y,rdf:type,?u)
SPARQL RDF Query Language • SPARQL is based on matching graph patterns • The simplest graph pattern is the triple pattern : • like an RDF triple, but with the possibility of a variable instead of an RDF term in the subject, predicate, or object positions • Combining triple patterns gives a basic graph pattern, where an exact match to a graph is needed to fulfill a pattern
Summary • RDF provides a foundation for representing and processing metadata • RDF has a graph-based data model • RDF has an XML-based syntax to support syntactic interoperability • XML and RDF complement each other because RDF supports semantic interoperability • RDF has a decentralized philosophy and allows incremental building of knowledge, and its sharing and reuse • RDF is domain-independent - RDF Schema provides a mechanism for describing specific domains • RDF Schema is a primitive ontology language • It offers certain modelling primitives with fixed meaning • Key concepts of RDF Schema are class, subclass relations, property, subproperty relations, and domain and range restrictions • There exist query languages for RDF and RDFS, including SPARQL
OWL Requirements for Ontology Languages • Ontology languages allow users to write explicit, formal conceptualizations of domain models • The main requirements are: • a well-defined syntax • efficient reasoning support • a formal semantics • sufficient expressive power • convenience of expression
Reasoning About Knowledge in Ontology Languages • Class membership • If x is an instance of a class C, and C is a subclass of D, then we can infer that x is an instance of D • Equivalence of classes • If class A is equivalent to class B, and class B is equivalent to class C, then A is equivalent to C, too • Consistency • X instance of classes A and B, but A and B are disjoint • This is an indication of an error in the ontology • Classification • Certain property-value pairs are a sufficient condition for membership in a class A; if an individual x satisfies such conditions, we can conclude that x must be an instance of A • Reasoning Support for OWL • Semantics is a prerequisite for reasoning support • Formal semantics and reasoning support are usually provided by mapping an ontology language to a known logical formalism; using automated reasoners that already exist for those formalisms • OWL is (partially) mapped on a description logic, and makes use of reasoners such as FaCT and RACER • Description logics are a subset of predicate logic for which efficient reasoning support is possible
Reasoning Support for OWL • Semantics is a prerequisite for reasoning support • Formal semantics and reasoning support are usually provided by • mapping an ontology language to a known logical formalism • using automated reasoners that already exist for those formalisms • OWL is (partially) mapped on a description logic, and makes use of reasoners such as FaCT and RACER • Description logics are a subset of predicate logic for which efficient reasoning support is possible
Some Limitations of the Expressive Power of RDF Schema • Local scope of properties • rdfs:range defines the range of a property (e.g. eats) for all classes • In RDF Schema we cannot declare range restrictions that apply to some classes only • E.g. we cannot say that cows eat only plants, while other animals may eat meat, too • Disjointness of classes • Sometimes we wish to say that classes are disjoint (e.g. male and female) • Boolean combinations of classes • Sometimes we wish to build new classes by combining other classes using union, intersection, and complement • E.g. person is the disjoint union of the classes male and female • Cardinality restrictions • E.g. a person has exactly two parents, a course is taught by at least one lecturer • Special characteristics of properties • Transitive property (like “greater than”) • Unique property (like “is mother of”) • A property is the inverse of another property (like “eats” and “is eaten by”)
Three Species of OWL • W3C’sWeb Ontology Working Group defined OWL as three different sublanguages: • OWL Full • OWL DL • OWL Lite • Each sublanguage geared toward fulfilling different aspects of requirements • OWL uses XML and RDF for syntax • OWL is based on Description Logic • Description Logic is a fragment of first-order logic • OWL inherits from Description Logic • The open-world assumption • The non-unique-name assumption