1.29k likes | 1.5k Views
An Introduction to RDF and the Semantic Web. Dr. Randy Kaplan. Resource Description Framework. RDF Least Understood standard to come from the W3C May be the most powerful In order that the web achieve its potential May be the most important In order that the web achieve its potential.
E N D
An Introduction to RDF and the Semantic Web • Dr. Randy Kaplan
Resource Description Framework • RDF • Least Understood standard to come from the W3C • May be the most powerful • In order that the web achieve its potential • May be the most important • In order that the web achieve its potential
Resource Description Framework • Why RDF? • With HTML and XML we can swap our documents easily • No meaning is attached to them - they are just data • RDF addresses the problem of meaning in the data on the web
What We Need To Know • When we exchange data we need to know things like, • Who wrote the data • When was the data written • When was the data last updated • These pieces of data are not data per se but the data about the data or meta data
XML • Promised to deliver us from the unstructured data that makes up the Internet • XML brings structure to the data • Because HTML combined the appearance of the document with the content of the document it, the content was extremely hard to extract • XML separated content from presentation
XML • XML specifically dealt with the data of the content <music genre =”classical”> <title>Eine Kleine Nacht Muzik</title> <composer>Mozart</composer> <key>E Flat</key> <tempo>2/4</tempo> </music>
XML • We could convey some of the same information with different data <document type =”classical music”> <name>Eine Kleine Nacht Muzik</name> <author>Mozart</author> </document>
XML • What if we wanted to find all pieces of music composed by Mozart? • We would have to find all documents where the <composer> element had a value of ‘Mozart’. • We would also have to find all documents where the <author> element had a value of ‘Mozart’.
XML • If there was another element used to denote the creator of the music then that term would have to be searched for also • In order to be able to find all compositions written by Mozart without having to identify all elements designating the creator of the music then the same term would have to be used to identify the creator
XML • This problem could also be solved by indicating that when the term composer is used, it means the same when another document says written by, and another says created by • This would be quite an undertaking though as it involves identifying all words and phrases in all languages having this meaning
Missing • Our ability to know that one or more terms mean the same thing is the thing that is missing from the Internet • If we can build this layer into the Internet, it will take the information to a fundamentally different level
Dublin Core • 1995 • Conference in Dublin, Ohio • Discussed issues of semantics • Agreed to a core set of themes common to all documents • Set of properties became known as the Dublin Core (DC) initiative
Dublin Core • 3 Core Properties • DC.Title • DC.Creator • DC.Subject • 15 core properties were defined in the Dublin core (originally)
Dublin Core • The Dublin Core can be applied to XML <music genre =”classical”> <title>Eine Kleine Nacht Muzik</title> <Creator>Mozart</Creator> <key>E Flat</key> <tempo>2/4</tempo> </music> <document type =”classical music”> <name>Eine Kleine Nacht Muzik</name> <Creator>Mozart</Creator> </document>
Dublin Core • Even though we now have used the same element to identify the entity responsible for creating the we don’t know if the meaning of “Creator” is the same in both of these instances • The only way to be sure is to use a very precise mechanism to identify the element being used
Dublin Core • The Dublin Core can be applied to XML <music genre =”classical”> <title>Eine Kleine Nacht Muzik</title> <dc.Creator xmlns:dc=”http://purl.org/dc/elements/1.1/”>Mozart</dc.Creator> <key>E Flat</key> <tempo>2/4</tempo> </music> <document type =”classical music”> <name>Eine Kleine Nacht Muzik</name> <dc.Creator xmlns:dc=”http://purl.org/dc/elements/1.1/”>Mozart</dc.Creator> </document> • Now we can see that these elements refer to exactly the same concept
CD Database • Suppose you keep a small database of CDs on your computer • There is a table in the database as below
Another CD Database • There is a second database kept by another person who has a CD collection • A table in the database is shown below
Comparing Databases • Exchanging Information • If we wanted to share information there would be a problem since the tuple names are different • The same solution we used in the XML can be used in the database - the unique identifier
Another CD Database • There is a second database kept by another person who has a CD collection • A table in the database is shown below
Another CD Database • There is a second database kept by another person who has a CD collection • A table in the database is shown below
URI’s • Uniform Resource Identifiers (URI’s) give us a way to insure that the meaning of the column of data between databases is the same so long as the column is labeled with the same URI
Other Problems • Unfortunately when we look at the databases we notice some other problems
Other Problems • Problem 1 • Albums which may be the same have different names • Problem 2 • Different names are used to denote the same composers
Taxonomies • These problems can be solved through the use of taxonomy • A taxonomy is a - • Controlled vocabulary of words • Usually about a constrained topic • Unique identifiers are key to developing taxonomies
Taxonomies • If we were to devise a controlled classification list so we could tell which CD’s were which genre then we would avoid problems like having one CD labeled as classical and another CD labeled as classic
Taxonomies • CD Taxonomy • Jazz • Classical • Soul • Pop • Hip Hop • Folk
Taxonomies • We are not limited to taxonomies of of music • We could have type of performance, i.e., play, movie, live performance, etc.
Moving the Problem • We really didn’t solve the problem we described earlier • We only moved the problem up a level • We now have the problem with having more than one taxonomy for the same thing
Moving the Problem • Consider • http://taxonomies.org/Plays/PorgyAndBess • http://taxonomies.org/Albums/PorgyAndBess • We do not know whether the PorgyAndBess in the first reference is the same as the PorgyAndBess in the same reference
We Need An Authority Figure • Let us imagine that there is some authority that keeps track of al CDs that are released • This is similar to books and their ISBN numbers which are unique • We will call the fictitious authority MuzicBiz.org • MuzicBiz.org maintains a central database of CDs that have been released
Unique Identifiers • Since we are guaranteed that these identifiers ALWAYS refer to the same CD any table row having a specific key will ALWAYS refer to the same CD - there is NO reason to doubt this • Data validity is enforced
Meta-Data • Meta-Data • Data that describes data • Creator, Type, Date are all kinds of meta-data • So far the meta-data we have described consists of two values - an attribute name and an attribute value
Meta-Data • To be precise we need to add one more piece of meta-data to complete any meta-data we might have • Since it is entirely possible to have as Creator, the value Mozart, we need to identify what/where Mozart is the creator of - the so-called DOCUMENT
Triples • The combination of Source, Attribute name, and Value makes what is called in the RDF-biz a TRIPLE and that constitutes a fundamental element in RDF
Transporting Triples • We will assume the following - • Meta-data can be expressed as a set of triples • Key to sharing meta-data is the URI • Now given that we accept this representation, the next challenge is to decide how we will share this information (transport)
Sharing Meta-Data and Data • The database contains the information as organized in the table above • We need to transform this data into the accepted form, i.e., triples
Sharing Data and Meta-Data • We have adequately represented the meta-data and it is “ready” for transport via XML • But this table only represents the meta-data and does not relate to any data described by it
Sharing Data and Meta-Data • We need a way to identify the document that the meta-data describes • For this purpose we add a name/value pair that names the URL of the document
Sharing Data and Meta-Data <document type="News Item" url="http://www.ePolitix.com/Articles/0000005a4787.htm" xmlns:dc="http://purl.org/dc/elements/1.1/"> <dc:Title>I will stand says Portillo</dc:Title> <dc:Creator>Craig Hoiy</dc:Creator> <dc:Subject>Tory leadership contest</dc:Subject> </document>
RDF: Model and Syntax • RDF Model • In this case the model we are speaking of are the triples • The definition of RDF is representation independent • This means that XML is only one way of writing RDF
RDF Terminology • In RDF terminology a STATEMENT is used to describe a triple • This term arises from using a triple to make a statement about a document
RDF Terminology • Triples • Resources and Properties • In the RDF specification the name part of the name/value pair is regarded as a PROPERTY • The subject of the meta data is regarded as a RESOURCE
RDF Terminology • Triples • A triple is the combination of the three parts - a resource with a property and a value
RDF Terminology • A triple can express a relationship between resources Track http://MuzicBiz.org/Albums/7655432 http://MuzicBiz.org/Tracks/1667653
RDF Terminology Track http://MuzicBiz.org/Albums/7655432 http://MuzicBiz.org/Tracks/1667653 • The terminology for this model is the SUBJECT of our statement is the album and the track is the OBJECT • The two resources are joined by a PREDICATE • The predicate specifies the nature of the relationship between the two resources
RDF Terminology • Notation • When writing about RDF it is useful to be able to show statements or sets of triples for discussion
Notation • English • English is simplist • Craig Hoy is the author of http://www.ePolitix.com/Articles/0000005a4787.htm