1.38k likes | 1.73k Views
Semantic Web. Bhargabi Chakrabarti , Reshmi De. OUTLINE. 1.Foundations of Semantic Web 2.RDF 3.RDFS 4.OWL 5.OWL2 6.Semantic Web Layer Cake 7.RIF. Foundations of Semantic Web. What is Semantic?.
E N D
Semantic Web BhargabiChakrabarti, Reshmi De
OUTLINE • 1.Foundations of Semantic Web • 2.RDF • 3.RDFS • 4.OWL • 5.OWL2 • 6.Semantic Web Layer Cake • 7.RIF
What is Semantic? • The word semantic itself implies meaning or understanding. As such, the fundamental difference between Semantic Web technologies and other technologies related to data (such as relational databases or the World Wide Web itself) is that the Semantic Web is concerned with the meaning and not the structure of data.
Why do we need Semantic web? Consider a typical web page: • Markup consists of: • rendering information (e.g.,font size and colour) • Hyper-links to related content • Semantic content is accessible to humans but not (easily) to computers…
What information we can see.. WWW2002 The eleventh international world wide web conference Sheraton waikiki hotel Honolulu, hawaii, USA 7-11 may 2002 1 location 5 days learn interact Registered participants coming from australia, canada, chiledenmark, france, germany, ghana, hongkong, india, ireland, italy, japan, malta, new zealand, the netherlands, norway, singapore, switzerland, the united kingdom, the united states, vietnam, zaire Register now On the 7th May Honolulu will provide the backdrop of the eleventh international world wide web conference. This prestigious event … Speakers confirmed Tim berners-lee Tim is the well known inventor of the Web, … Ian Foster Ian is the pioneer of the Grid, the next generation internet …
What information can a machine see… WWW2002 The eleventh international world wide web conference Sheraton waikiki hotel Honolulu, hawaii, USA 7-11 may 2002 1 location 5 days learn interact Registered participants coming from australia, canada, chiledenmark, france, germany, ghana, hongkong, india, ireland, italy, japan, malta, new zealand, the netherlands, norway, singapore, switzerland, the united kingdom, the united states, vietnam, zaire Register now On the 7th May Honolulu will provide the backdrop of the eleventh international world wide web conference. This prestigious event … Speakers confirmed Tim berners-lee Tim is the well known inventor of the Web, … Ian Foster Ian is the pioneer of the Grid, the next generation internet …
Solution: XML markup with “meaningful” tags? <name>WWW2002 The eleventh international world wide webcon</name> <location>Sheraton waikiki hotel Honolulu, hawaii, USA</location> <date>7-11 may 2002</date> <slogan>1 location 5 days learn interact</slogan> <participants>Registered participants coming from australia, canada, chiledenmark, france, germany, ghana, hongkong, india, ireland, italy, japan, malta, new zealand, the netherlands, norway, singapore, switzerland, the united kingdom, the united states, vietnam, zaire</participants> <introduction>Register now On the 7th May Honolulu will provide the backdrop of the eleventh international world wide web conference. This prestigious event … Speakers confirmed</introduction> <speaker>Tim berners-lee</speaker> <bio>Tim is the well known inventor of the Web,</bio>…
Machine sees… <name>WWW2002 The eleventh international world wide webc</name> <location>Sheraton waikiki hotel Honolulu, hawaii, USA</location> <date>7-11 may 2002</date> <slogan>1 location 5 days learn interact</slogan> <participants>Registered participants coming from australia, canada, chiledenmark, france, germany, ghana, hongkong, india, ireland, italy, japan, malta, new zealand, the netherlands, norway, singapore, switzerland, the united kingdom, the united states, vietnam, zaire</participants> <introduction>Register now On the 7th May Honolulu will provide the backdrop of the eleventh international world wide web conference. This prestigious event … Speakers confirmed</introduction> <speaker>Tim berners-lee</speaker> <bio>Tim is the well known inventor of the W</bio> <speaker>Ian Foster</speaker> <bio>Ian is the pioneer of the Grid, the ne</bio>
Solution To enable machine processing - There can be two approaches: • Smarter machines • Smarter data
Approach. 1 Smarter machines Teach computers to understand the meaning of Web data -The Artificial Intelligence (AI) approach • Natural language processing • Image recognition • Etc.
Approach 2 Smarter data Make data easier for machines to understand • Express meaning in a machine processable format Example: metadata • The Semantic Web approach –Injecting more metadata so that data become structured.
The Current Web Minimal machine processable information --dumb links . Resources are linked together forming the Web. There is no distinction between resources or the links that connect resources.
The Semantic Web -An extension of the current Web More machine-processible information To give meaning to resources and links, new standards and languages are being investigated and developed. The rules and descriptive information made available by these languages allow the type of resources on the Web and the relationships between resources to be characterized individually and precisely.
Why is machine processing difficult? Two key problems: • Problem 1: Ambiguity • Problem 2: Language complexity
Ambiguity "David Booth has VIN #2745534." Which "David Booth"? • Vehicle #2745534? • Vinyl siding order #2745534? Need to identify things: • Unambiguously, in a • Uniform • Web-friendly way
Kinds of things to identify Three kinds of things in the universe: 1) Web resources 2) Non-Web resources - Physical objects Eg Cars, people, houses, etc. 3) Abstract concepts Sizes, colors, verbs, "love", etc. "Creator" (e.g., the creator of a document) "Airline reservation"
Unambiguously identifying Web resources • Solution (trivial): URLs • http://www.example.org/index.html
Unambiguously identifying physical objects Many human systems: Vehicle Identification Numbers (VIN) Product serial numbers Employee numbers Problems: Too many formats Most are not global in scope Solution: Convert to URIs • http://www.example.com/employeeid/85740
Unambiguously identifying abstract concepts • Solution: Use URIs Problem: Which URIs? • Need to agree on common vocabulary Solution: Ontology
URI • In computing, a uniform resource identifier (URI) is a string of characters used to identify a name or a resource. URIs can be classified as locators (URLs), as names (URNs), or as both. A uniform resource name (URN) functions like a person's name, while a uniform resource locator (URL) resembles that person's street address. In other words: the URN defines an item's identity, while the URL provides a method for finding it.
Ontology "Formal description of concepts and their relationships" In other words: Vocabulary of terms "book", "publication", "greyhound", "dog" And their relationships • "book is-a-kind-of publication" • "greyhound is-a-kind-of dog"
Ontology • Vocabulary+Structure=Taxonomy • Taxonomy+Relationships,Contraints, Rules=Ontology • Ontology+Instances=Knowledge Base
Structure of an Ontology Ontologies typically have two distinct components: • Names for important concepts in the domain • Elephant is a concept whose members are a kind of animal • Herbivore is a concept whose members are exactly those animals who eat only plants or parts of plants • Adult_Elephant is a concept whose members are exactly those elephants whose age is greater than 20 years • Background knowledge/constraints on the domain • Adult_Elephants weigh at least 2,000 kg • All Elephants are either African_Elephants or Indian_Elephants • No individual can be both a Herbivore and a Carnivore
Dublin Core One well-known ontology Defines 15 basic terms for documents and publishing: "title", "creator", "subject", "publisher“ Each term unambiguously identified by URI http://purl.org/dc/elements/1.1/creator
Ontology Languages • Wide variety of languages for “Explicit Specification” • Graphical notations • Semantic networks • Topic Maps (see http://www.topicmaps.org/) • UML • RDF • Logic based • Description Logics (e.g., OIL, DAML+OIL, OWL) • Rules (e.g., RuleML, LP/Prolog) • First Order Logic (e.g., KIF) • Conceptual graphs • (Syntactically) higher order logics (e.g., LBase) • Non-classical logics (e.g., Flogic, Non-Mon, modalities) • Probabilistic/fuzzy • Degree of formality varies widely • Increased formality makes languages more amenable to machine processing (e.g., automated reasoning)
What is the Purpose of RDF? • The purpose of RDF (Resource Description Framework) is to give a standard way of specifying data "about" something. • Here's an example of an XML document that specifies data about China's Yangtze river: <?xml version="1.0"?> <River id="Yangtze" xmlns="http://www.geodesy.org/river"> <length>6300 kilometers</length> <startingLocation>western China's Qinghai-Tibet Plateau</startingLocation> <endingLocation>East China Sea</endingLocation> </River> "Here is data about the Yangtze River. It has a length of 6300 kilometers. Its startingLocation is western China's Qinghai-Tibet Plateau. Its endingLocation is the East China Sea."
Modify the following XML document so that it is also a valid RDF document: <?xml version="1.0"?> <River id="Yangtze" xmlns="http://www.geodesy.org/river"> <length>6300 kilometers</length> <startingLocation>western China's Qinghai-Tibet Plateau</startingLocation> <endingLocation>East China Sea</endingLocation> </River> XML Yangtze.xml "convert to" <?xml version="1.0"?> <River rdf:ID="Yangtze" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.geodesy.org/river#"> <length>6300 kilometers</length> <startingLocation>western China's Qinghai-Tibet Plateau</startingLocation> <endingLocation>East China Sea</endingLocation> </River> RDF Yangtze.rdf XML --> RDF
The RDF Format RDF provides an ID attribute for identifying the resource being described. 1 The ID attribute is in the RDF namespace. 2 <?xml version="1.0"?> <River rdf:ID="Yangtze" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.geodesy.org/river#"> <length>6300 kilometers</length> <startingLocation>western China's Qinghai-Tibet Plateau</startingLocation> <endingLocation>East China Sea</endingLocation> </River> 3 Add the "fragment identifier symbol" to the namespace.
2 Identifies the resource being described. This resource is an instance of River. Identifies the type (class) of the resource being described. 1 <?xml version="1.0"?> <River rdf:ID="Yangtze" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.geodesy.org/river#"> <length>6300 kilometers</length> <startingLocation>western China's Qinghai-Tibet Plateau</startingLocation> <endingLocation>East China Sea</endingLocation> </River> These are properties, or attributes, of the type (class). 3 Values of the properties 4 The RDF Format (cont.)
Question: Why was "#" placed onto the end of the namespace? E.g., xmlns="http://www.geodesy.org/river#" Answer: RDF is very concerned about uniquely identifying things - uniquely identifying the type (class) and uniquely identifying the properties. If we concatenate the namespace with the type then we get a unique identifier for the type, e.g., Best Practice Best Practice http://www.geodesy.org/river#River If we concatenate the namespace with a property then we get a unique identifier for the property, e.g., http://www.geodesy.org/river#length http://www.geodesy.org/river#startingLocation http://www.geodesy.org/river#endingLocation Thus, the "#" symbol is simply a mechanism for separating the namespace from the type name and the property name. Namespace Convention
rdf:ID • The value of rdf:ID is a "relative URI". • The "complete URI" is obtained by concatenating the URL of the XML document with "#" and then the value of rdf:ID, e.g., <?xml version="1.0"?> <River rdf:ID="Yangtze" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.geodesy.org/river#"> <length>6300 kilometers</length> <startingLocation>western China's Qinghai-Tibet Plateau</startingLocation> <endingLocation>East China Sea</endingLocation> </River> Yangtze.rdf Suppose that this RDF/XML document is located at this URL: http://www.china.org/geography/rivers. Thus, the complete URI for this resource is: http://www.china.org/geography/rivers#Yangtze
xml:base • On the previous slide we showed how the URL of the document provided the base URI. • Depending on the location of the document is brittle: it will break if the document is moved, or is copied to another location. • A more robust solution is to specify the base URI in the document, e.g., <?xml version="1.0"?> <River rdf:ID="Yangtze" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.geodesy.org/river#" xml:base="http://www.china.org/geography/rivers"> <length>6300 kilometers</length> <startingLocation>western China's Qinghai-Tibet Plateau</startingLocation> <endingLocation>East China Sea</endingLocation> </River> Resource URI = concatenation(xml:base, '#', rdf:ID) = concatenation(http://www.china.org/geography/rivers, '#', "Yangtze") = http://www.china.org/geography/rivers#Yangtze
rdf:about • Instead of identifying a resource with a relative URI (which then requires a base URI to be prepended), we can give the complete identity of a resource. However, we use rdf:about, rather than rdf:ID, e.g., <?xml version="1.0"?> <River rdf:about="http://www.china.org/geography/rivers#Yangtze" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.geodesy.org/river#"> <length>6300 kilometers</length> <startingLocation>western China's Qinghai-Tibet Plateau</startingLocation> <endingLocation>East China Sea</endingLocation> </River>
The RDF Format <?xml version="1.0"?> <Classrdf:ID="Resource" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="uri"> <property>value</property> <property>value</property> ... </Class>
Advantage of using the RDF Format Interoperability • You may ask: "Why should I bother designing my XML to be in the RDF format?" • Answer: there are numerous benefits: • The RDF format, if widely used, will help to make XML more interoperable: • Tools can instantly characterize the structure, "this element is a type (class), and here are its properties”. • The RDF format gives you a structured approach to designing your XML documents. The RDF format is a regular, recurring pattern. • It enables you to quickly identify weaknesses and inconsistencies of non-RDF-compliant XML designs. It helps you to better understand your data! • You reap the benefits of both worlds: • You can use standard XML editors and validators to create, edit, and validate your XML. • You can use the RDF tools to apply inferencing to the data. • It positions your data for the Semantic Web! Networkeffect
Disadvantage of using the RDF Format • Constrained: the RDF format constrains you on how you design your XML (i.e., you can't design your XML in any arbitrary fashion). • RDF uses namespaces to uniquely identify types (classes), properties, and resources. Thus, you must have a solid understanding of namespaces. • Another XML vocabulary to learn: to use the RDF format you must learn the RDF vocabulary.
http://www.china.org/geography/rivers#Yangtze has a http://www.geodesy.org/river#length of 6300 kilometers property resource value http://www.china.org/geography/rivers#Yangtze has a http://www.geodesy.org/river#startingLocation of western China's ... property resource value http://www.china.org/geography/rivers#Yangtze has a http://www.geodesy.org/river#endingLocation of East China Sea property resource value Triple -> resource/property/value
The RDF Format = triples! • The fundamental design pattern of RDF is to structure your XML data as resource/property/value triples! <?xml version="1.0"?> <Resource-A> <property-A> <Resource-B> <property-B> <Resource-C> <property-C> Value-C </property-C> </Resource-C> </property-B> </Resource-B> </property-A> </Resource-A> Notice that the RDF design pattern is an alternating sequence of resource-property. This pattern is known as "striping". value of property-A value of property-B The value of a property can be a literal (e.g., length has a value of 6300 kilometers). Also, the value of a property can be a resource, as shown above (e.g., property-A has a value of Resource-B, property-B has a value of Resource-C).
RDF Model (graph) Legend: Ellipse indicates "Resource" Rectangle indicates "literal string value"
predicate Subject Object Equivalent! property Resource Value Terminology • As you read the RDF literature you may see the following terminology: • Subject: this term refers to the item that is playing the role of the resource. • predicate: this term refers to the item that is playing the role of the property. • Object: this term refers to the item that is playing the role of the value.
RDF Parser • There is a nice RDF parser at the W3 Web site: http://www.w3.org/RDF/Validator/ This RDF parser will tell you if your XML is in the proper RDF format.
What is missing from RDF? A Schema Support Enables Reasoning Solution: Use RDF-S (RDF Schema)
RDF Schema (RDFS) • RDF gives a formalism for meta data annotation, and a way to write it down in XML, but it does not give any special meaning to vocabulary such as subClassOf or type • RDF Schema allows you to define vocabulary terms and the relations between those terms • it gives “extra meaning” to particular RDF predicates and resources • this “extra meaning”, or semantics, specifies how a term should be interpreted
RDF Schema • Extension to RDF to allow definition of application-specific classes and properties • Provides a framework to describe such Classes - similar to OOP • Allows instances and subclasses of classes.
RDF Schema is about creating Taxonomies! NaturallyOccurringWaterSource BodyOfWater Stream Brook River Ocean Tributary Lake Sea Properties: length: Literal emptiesInto: BodyOfWater Rivulet