730 likes | 917 Views
An Introduction to Linked Data, Its Applications and Challanges. Samad Paydar samad.paydar@stu-mail.um.ac.ir WTLab Research Group Ferdowsi University of Mashhad. 2 nd October 2009. Outline. The Web of Documents vs. the Web of Data Linked Data Linking Open Data Project
E N D
An Introduction to Linked Data,Its Applications and Challanges Samad Paydar samad.paydar@stu-mail.um.ac.ir WTLab Research Group Ferdowsi University of Mashhad 2nd October 2009
Outline • The Web of Documents vs. the Web of Data • Linked Data • Linking Open Data Project • Linked Data Technology Stack • Linking Data Applications • Outlook • Similar Developments • Challenges
The Web of Documents • Traditional Web, Hypertext Web • Analogy • A global filesystem • Designed for • Human consumption • Primary objects • Documents • Links • Untyped • Between documents (or parts of documents) • Degree of structure in object • Fairy low • Semantics of content and links • implicit
The Web of Documents : Challenges • The Web has radically altered the way people share knowledge • By lowering the barrier to publishing and accessing documents • But it is not so about applications and data • Traditionally, data on the Web is published as formats like HTML tables, CSV or XML files, … • Much of the structure and semantic of data is sacrificed.
The Web of Documents : Challenges • Data integration • “Show me all the publications from Semantic Web-related conferences in 2007” • Querying across data sources • “Which WWW2008 papers have been written by people from companies of less than 100 people?” • Note that all the data required to answer the above questions might be available on the Web.
The Web of Data • Analogy • A global data space • Designed for • Machines first, humans later • Primary objects • Things (description of things) • Links • Typed • Between things • Degree of structure in objects • High • Semantic of content and links • Explicit
Linked Data • Is about using the Web to create typed links between data from different sources • Refers to data published on the Web in such a way that • It is machine-readable • Its meaning is explicitly defined • It is linked to other datasets • It can be linked to from external datasets
Linked Data and Web of Data • The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web. These best practices have been adopted by an increasing number of data providers over the last three years, leading to the creation of a global data space containing billions of assertions - the Web of Data.
Properties of the Web of Data • It is generic • Can contain any type of data • Data about anything • Anyone can publish data • No constraints on choice of vocabularies entities are connected by RDF links
LOD Project • Linking Open Data Project • A community project • Founded in January 2007 • Supported by W3C Semantic Web Education and Outreach Group • Goal: to bootstrap the Web of Data by identifying existing datasets that are available under open licenses, converting them to RDF (according to Linked Data principles), interlink them with other datasets, and publishing then on the Web
LOD Cloud • The image shows only datasets that are published based on Linked Data Principles and are interlinked with at least one other dataset in the cloud • Each circle represents a dataset • Size of the circle corresponds to the number of triples • Arrows represent the links between datasets • Thickness of arrows indicates number of links between datasets • Some datasets act as hub • E.g. DBpedia, Geonames, …
DBpedia • Extract structured information from Wikipedia and making it available on the Web under an open license
Geonames • Contains over eight million geographical names • 6.5 million unique features • 2.2 million populated places and 1.8 million alternate names • features categorized into one out of nine feature classes • further subcategorized into one out of 645 feature codes
LOD Cloud • Content of the cloud is diverse • Data about geographic locations, people, companies, books, scientific publications, companies, books, films, music, TV programs, genes, proteins, … • Some statistics • The Web of Data currently consists of 4.7 billion RDF triples, interlinked around 142 million RDF links (May 2009)
A Programmer’s Point of View • Semantic technologies like Linked Data, decouple applications from data through the use of a simple, abstract data model • Any application that understands the model, can consume any data source published based on the model
Don’t Miss books • To really feel it, I recommend to study
Linked Data Principles • Berners-Lee, 2006 • Use URIs as names for things • Use HTTP URIs so that people can lookup those names • When someone looks up a URI, provide useful information • Include links to other URIs, so that they can discover more things
URI: Uniform Resource Identifier • “URI provides a simple and extensible means for identifying a resource” RFC 3986 • URL: for documents and other entities that can be located on the Web • URI is a more generic means to identify any entity existing in the world
HTTP • Provides URI dereferencing: A simple mechanism for retrieving • resources that can be serialized as a stream of bytes • E.g. picture of a dog • Descriptions of entities that cannot themselves be sent across network • E.g. the dog itself
RDF • HTML provides a means to structure and link documents • RDF provides a generic, graph-based data model to structure and link data that describes things • A triple [subject, predicate, object] • Subject: a URI • Object: a URI or a string literal • Predicate: a URI
RDF Link • RDF Link: take the form of RDF triples, where the subject of the triple is a URI reference in the namespace of one data set, while the object of the triple is a URI reference in the other • S: http://data.linkedmdb.org/resource/film/77 • P: http://www.w3.org/2002/07/owl#sameAs • O: http://dbpedia.org/resource/Pulp_Fiction_%28film%29 • Allow client applications to navigate between data sources to discover additional data
RDFS / OWL • Provide a basis for creating vocabularies that can be used to describe entities in the world and how they are related
Linked Data • Linked Data employs • HTTP URIs to identify resources • HTTP Protocol to retrieve resources • RDF data model to represent resources • Therefore, it is built on the general architecture of the Web
Current Applications • Numerous efforts are underway to research and build applications that exploit this Web of data. At present, these efforts can be broadly classified into three categories: • Linked Data browsers • Linked Data search engines and indexes • Domain-specific Linked Data applications
Linked Data Applications • Linked Data Browsers • Browse things, not just documents • Browse and navigate between data • E.g. Disco, Tabulator, Marbles
Data about Berlin on DBpedia is linked to data about Berlin on Geonames
Linked Data Search Engines and Indexes • Crawl Linked Data from the Web and provide query capabilities over aggregated data • Human-oriented • E.g. Falcon, SWSE • Application-oriented • E.g. Swoogle, Watson,
Domain-Specific Applications • Revyu • Dbpedia Mobile • Talis Aspire • BBC Programmes and BBC Music
DBpedia Mobile Uses Dbpedia, Revyu, and Flickr