Computer communication B

Computer communication B The Semantic Web

Bibliography • The Semantic Web, Scientific American, May 2001, Tim Berners-Lee, James Hendler and Ora Lassila. • Breitman, K.K., Casanova, M.A., & Truszkowski, W. (2007) Semantic web: Concepts, Technologies and Applications. Springer Verlag, London • http://www.w3.org/ • Antoniou, G., Van Harmelen, F. `(2004) “A Semantic web Primer”(see library or Pdf copy)

The semantic web: some definitions • The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation (Berners-Lee, Hendlers, J. & Lassila, O., 2001) • “The Semantic Web is a vision: the idea of having data on the Web defined and linked in a way that it can be used by machines not just for display purposes, but for automation, integration and reuse of data across various applications (W3C, 2003) • “Soon it will be possible to access the Web resources by content rather than just by keywords (Anutariya et al, 2001)

The semantic web (SW): definitions • “The Semantic Web is a vision: the idea of having data on the Web defined and linked in a way that it can be used by machines not just for display purposes, but for automation, integration and reuse of data across various applications (W3C, 2003) • “Soon it will be possible to access the Web resources by content rather than just by keywords (Anutariya et al, 2001)

Semantic Web: Introduction • The content of the present Word Wide Web is nowadays only accessible and can be elaborated only by people • The Semantic Web is an enlargement of the WWW with semantic information that can be used by computers • With the help of semantic information the content of pages could be processed automatically and computers could make inferences about a search

The semantic web: characteristics • The semantic web is not different from the www, is actually a developing part of it. • The infrastructures and characteristics should be common • Use URI (Uniform resource Identifiers) addressing • Use protocols that a have a small and universally understood set of commands (like HTTP: Hypertext Transfer Protocol) • Be decentralized (like the www) • Function on a large scale

The semantic web: The layer cake

The semantic web • Two characteristics for the construction of the semantic web • Downward compatibility Agents fully aware of a layer should also be able to interpret and use information written at lower levels. For example, agents aware of the semantics of OWL can take full advantage of information written in RDF and RDF Schema. • Upward partial understanding On the other hand, agents fully aware of a layer should take at least partial advantage of information at higher levels. For example, an agent aware only of the RDF and RDF Schema semantics can interpret knowledge written in OWL partly, by disregarding those elements that go beyond RDF and RDF Schema.

XML: Extensible Markup Language 1 • It is a general purpose markup Language for creating specific purpose mark-up languages • Follows the SGML-standards (Standard Generlised Markup Language) • With XML the single users can create their own tags (which is not possible with HTML) • Differences between HTML and XML • HTML (Hypertext Markup Language) • Has a fixed set of tags • It is most frequently used to define the lay-out • Does not focus on the logical content or on the structure • XML • It is possible to personally define the tags • Tags reflect a content • The layout is defined in a separate document (stylesheet)

WWW: HTML

Semantic web: XML

XML: Extensible Markup Language 2 • A XML document consists of plain text and markup, in the form of tags. • A XML document is interpreted by application programs • A XML document can be represented in a form of a “tree”

XML: Extensible Markup Language 3 • A XML document consists of • Elements formed by • A start-tag • A content • A matching end-tag • Elements can be nested in a tree form • Every element is named after the start-tag

XML: Extensible Markup Language 4 • A start-tag can have zero or more attributes • Name (followed by the equal sign) • Value (between double quotes) • Every XML-document has to follow a specific syntax: • Every start-tag needs a matching end-tag (see previous slide) • Elements need to be nested in other elements • An XML-document can contain a XML-schema (defines additional constraints on the document structure) value Name

RESOURCE, URIs and NAMESPACES • A resource is anything that has an identity • Digital (i.e an electronic document) • Physical (i.e. a book) • A URI (Uniform Resource Identifier) is a character string that identifies a resource on the Web • URIs can follow different schemes • FTP (File Transfer Protocol) • HTTP (Hypertext Transfer Protocol http://www.mysite.com/food.html

Namespaces • Namespaces are contexts, the domain of specific elements • Namespaces are identified by a URI • URIref: It is a URI with an optional fragment identifier attached to it, preceded by #

RDF: Resource Description Framework 1 • RDF is a general-purpose language for representing information in the web • Useful to represent metadata about Web resources • RDF describes resources (Both abstract or concrete subjects) identifiable via an URI • The syntax of RDF is based on XML • RDF-documents are written as XML-documents with the tag rdf:RDF

RDF Statements • A RDF-statement is described by a triple (S, P, O) • S= Subject of the statement (It’s a URIref) • P= Property (Predicate) of the statement (URIref) • The value of a property can be a simple value (ordinary number), or can be a literal (string of characters) • O= Object

RDF-Schema • A RDF-schema: • Offers the bases to model hierarchies and classes of properties.

Ontology: definitions Ontology comes from: Ontos (greek)= Being + Logos=Word Gruber (1993): “An ontology is a formal explicit specification of a shared conceptualization” WC3-consortium “Ontology is a term borrowed from philosophy that refers to the science of describing the kinds of entities in the world and how they are related “ Should be machine readable A abstract model

Ontology • Ontology categorizes concepts (which are defined by a set of common properties) into classes based on common characteristics • Ontology is the representation of the knowledge of a domain where a set of objects and their relationships is described by a vocabulary. • Ontologies should provide descriptions for • Classes (things) in the various domains • Relationships among things • Properties of these things

Ontology • Ontologies should satisfy certain demands: • Expressivity: domains should be described • Consistency: it should not give contradictory information • It should support reasoning processes • Ontologies are useful in sharing and exchanging information between software agents • Ontologies do not necessarily reflect the human way of thinking of how knowledge is classified • Ontologies should therefore not be seen as a reflection of human intelligence

Ontology vs Taxonomy • Taxonomy • Is a classification of terms in form of a hierarchy using typically a father-son relationship (i.e. Type of) • Example The taxonomy of the leaving beings Kingdom: Animalia Filo: Chordata Subfilo: Vertebrata ……

Web Ontology Languages • They are designed to define ontologies • They are based on RDF and RDF-schema • SHOE • Oil (Ontology inference Layer) • OWL (Web Ontology Language) http://www.w3.org/TR/owl-features/ • It is an ontology description language • It is a standard language for the modeling of ontologies • Facilitates the interpretability of the Web content (more than XML or RDF) • Less complex than RDF-schema • Has additional vocabulary based on description logic

OWL 1 • Describes classes, properties and relations to facilitate machine interpretability of the Web content • Owl is defined as a vocabulary (like RDF) but is semantically richer • In OWL classes of entities can be specified in different ways, for example: • Which individuals belong to a specific class • Which qualities should have the elements that belong to that class • If individuals belong to a particular underclass

OWL 2 • It is possible to limit a membership in a subclass • With RDF-schema it is possible to produce subclasses

OWL: example 1 • Definition of “name” • Properties, classes and things are distinguished • Things can be summarized in classes

OWL: example 2 • Definition of “marine mammal” • Precise definition of classes: How many subclasses are there?

OWL: example 2 • What is dolphin? • A class is defined like a cross-section of two classes

Logic 1: Logic rules • With logic it is possible to go further than the explicit information • It is possible to control if till now the collected information is consistent or if it has to be revised • With the rule of logic it is possible to derive novel information • Two inference rules • Modus Ponens If A → B A Therefore B If: “if x is a person than this person has a mother” “Jan is a person” → “Jan has a mother” • Modus Tollens (from “tollere” take away from) If A → B ⌐B Therefore ⌐A If: “if x is a person than this person has a mother” “Jan does not have a mother” → “Jan is not a person”

Logic 2 • Logic is the study of the principles of valid inferences and demonstrations (traditionally considered a part of philosophy) • Logic rules are used to create reasonings • Their formulation is close to the formulation used in natural language • Logic is about the rules themselves, and not about the way they have to be applied. • Classical logic is unfortunately too limited to model all types of human reasoning • Default-reasoning: “if x is a bird then x can fly” “if x a penguin is then x can not fly” • More complicated forms of logic are more difficult to process

Proof • There is a difference between the rule formulation and their application to solve a problem • There must be an intelligent choice of rules and facts (a good order for the rules) • There should be good strategies for efficient argumentations • Proof-systems are important topics in informatics • But not much has been done in the domain of the Semantic Web

Trust • Trust can be perceived at different levels • Is the processed information plausible? • How can be trust be verified? • On human judgments • Based on social trust (trust people that are trusted by other people)

Computer communication B