590 likes | 806 Views
Linked Data at present Using Linked Data. Ferdowsi University of Mashhad Web Technology Lab. ( WTLab ), www.wtlab.um.ac.ir Linked Data Group (LDG ). Mahboubeh Dadkhah May 11, 2011. You may know the Linked Data. History. Linked Data Design Issues by TimBL July 2006
E N D
Linked Data at presentUsing Linked Data Ferdowsi University of Mashhad Web Technology Lab. (WTLab), www.wtlab.um.ac.ir Linked Data Group (LDG) MahboubehDadkhah May 11, 2011
You may know the Linked Data
History • Linked Data Design Issues by TimBL July 2006 • Linked Open Data Project WWW2007 • First LOD Cloud May 2007 • 1st Linked Data on the Web Workshop WWW2008 • 1stTriplification Challenge 2008 • How to Publish Linked Data Tutorial ISWC2008 • BBC publishes Linked Data 2008 • 2nd Linked Data on the Web Workshop WWW2009 • NY Times announcement SemTech2009 - ISWC09 • 1st Linked Data-a-thon ISWC2009 • 1st How to Consume Linked Data Tutorial ISWC2009 • Data.gov.uk publishes Linked Data 2010 • 2st How to Consume Linked Data Tutorial WWW2010 • 1st International Workshop on Consuming Linked Data COLD2010 • …
Now that the Linked Data is here What to do next? Let’s Make Use of It
Linked Data • Before using we should be sure that we understand the meaning. • What was the problem: Searching and Finding Search for Football Players who went to the University of Texas at Austin, played for the Dallas Cowboys as Cornerback
Current Web = internet + links + docs Why cant we find it?
So, what to do? • Make it easy for computers/software to find THINGS Publish Thing • As data • In a standardized way: RDF • RDF data is serialized in different ways: • RDF/XML, RDFa, N3, Turtle, JSON
hasReview http://…/review1 http://…/isbn978 Programming the Semantic Web description title hasReviewer sameAs Awesome Book author http://…/isbn978 Toby Segaran http://…/reviewer name isbn 978-0-596-15381-6 Juan Sequeda publisher sameAs http://…/publisher1 name O’Reilly http://juansequeda.com/id http://dbpedia.org/Austin livesIn name Juan Sequeda
2009’s Top 10 Linked Data Research Issues • Data Linking and Fusion • linking algorithms and heuristics, identity resolution • Web data integration and data fusion • evaluating quality and trustworthiness of Linked Data • Linked Data Application Architectures • crawling, caching and querying Linked Data on the Web; optimizations, performance • Linked Data browsers, search engines • applications that exploit distributed Web datasets • Data Publishing • tools for publishing large data sources as Linked Data on the Web (e.g. relational databases, XML repositories) • embedding data into classic Web documents (e.g. GRDDL, RDFa, Microformats) • licensing and provenance tracking issues in Linked Data publishing • business models for Linked Data publishing and consumption
2010’s Top 10 Linked Data Research Issues • Linked Data Application Architectures • crawling, caching and querying Linked Data • dataset dynamics and synchronization • Linked Data mining • Data Linking and Data Fusion • linking algorithms and heuristics, identity resolution • Web data integration and data fusion • link maintanance • performance of linking infrastructures/algorithms on Web data • Quality, Trust and Provenance in Linked Data • tracking provenance and usage of Linked Data • evaluating quality and trustworthiness of Linked Data • profiling of Linked Data sources • User Interfaces for the Web of Data • approaches to visualizing and interacting with distributed Web data • Linked Data browsers and search engines • Data Publishing • tools for publishing large data sources as Linked Data on the Web (e.g. relational databases, XML repositories) • embedding data into classic Web documents (e.g. RDFa, Microformats) • describing data on the Web (e.g. voiD, semantic site maps) • licensing issues in Linked Data publishing
2011’s Top 10 Linked Data Research Issues • Foundations of Linked Data • Web architecture and dataspace theory • dataset dynamics and synchronisation • analyzing and profiling the Web of Data • Data Linking and Fusion • entity consolidation and linking algorithms • Web-based data integration and data fusion • performance and scalability of integration architectures • Write-enabling the Web of Data • access authentication mechanisms for Linked Datasets (WebID, etc.) • authorisation mechanisms for Linked Datasets (WebACL, etc.) • enabling write-access to legacy data sources (Google APIs, Flickr API, etc.) • Data Publishing • publishing legacy data sources as Linked Data on the Web • cost-benefits of the 5 star LOD plan • Data Usage • tracking provenance of Linked Data • evaluating quality and trustworthiness of Linked Data • licensing issues in Linked Data publishing • distributed query of Linked Data • RDF-to-X, turning RDF to legacy data • Interacting with the Web of Data • approaches to visualising Linked Data • interacting with distributed Web data • Linked Data browsers, indexers and search engines
Linked Data makes the web appear as ONEGIANTHUGEGLOBALDATABASE!
Do you remember Search and Find ?
SPARQL Endpoints Query Linked Data with • Linked Data sources usually provide a SPARQL endpoint for their dataset(s) • SPARQL endpoint: SPARQL query processing service that supports the SPARQL protocol* • Send your SPARQL query, receive the result * http://www.w3.org/TR/rdf-sparql-protocol/
SPARQL queries over multipledatasets How to do this? • Issue follow-up queries to different endpoints • Querying a central collection of datasets • Build store with copies of relevant datasets • Use query federation system
1- Follow-up Queries • Idea: issue follow-up queries over other datasets based on results from previous queries • Substituting placeholders in query templates
String s1 = "http://cb.semsol.org/sparql"; String s2 = "http://dbpedia.org/sparql"; String qTmpl = "SELECT ?c WHERE{ <%s>rdfs:comment ?c }"; String q1 = "SELECT ?s WHERE { ..."; QueryExecution e1 = QueryExecutionFactory.sparqlService(s1,q1); ResultSet results1 = e1.execSelect(); while ( results1.hasNext() ) { QuerySolution s1 = results.nextSolution(); String q2 = String.format( qTmpl, s1.getResource("s"),getURI() ); QueryExecution e2= QueryExecutionFactory.sparqlService(s2,q2); ResultSet results2 = e2.execSelect(); while ( results2.hasNext() ) { // ... } e2.close(); } e1.close(); Find a list of companies Filtered by some criteria and return DbpediaURIs from them
1- Follow-up Queries • Advantage • Queried data is up-to-date • Drawbacks • Requires the existence of a SPARQL endpoint for each dataset • Requires program logic • Very inefficient
2- Querying a Collection of Datasets • Idea: Use an existing SPARQL endpoint that provides access to a set of copies of relevant datasets • Example: • SPARQL endpoint over a majority of datasets from the LOD cloud at: http://uberblic.org http://lod.openlinksw.com/sparql
(Linked) Data Marketplaces • FactForge • Integrates some of the most central LOD datasets • General-purpose information(not specific to a domain) • 1.2billion explicit and 1 billion inferred statements • The largest upper-level knowledge base • http://www.FactForge.net • LinkedLifeData • 25 of the most popular life-science datasets • 2.7billion explicit and 1.4 billion inferred statements • http://www.LinkedLifeData.com
2- Querying a Collection of Datasets • Advantage • No need for specific program logic • Drawbacks • Queried data might be out of date • Not all relevant datasets in the collection
3- Own Store of Dataset Copies • Idea: Build your own store with copies of relevant datasets and query it • Possible stores: • Jena TDB http://jena.hpl.hp.com/wiki/TDB • Sesame http://www.openrdf.org/ • OpenLink Virtuoso http://virtuoso.openlinksw.com/ • 4store http://4store.org/ • AllegroGraphhttp://www.franz.com/agraph/ • etc.
3- Own Store of Dataset Copies • Advantages • No need for specific program logic • Can include all datasets • Independent of the existence, availability, and efficiency of SPARQL endpoints • Drawbacks • Requires effort to set up and to operate the store • Ideally, data sources provide RDF dumps; if not? • How to keep the copies in sync with the originals? • Queried data might be out of date
4- Federated Query Processing • Idea: Querying a mediator which distributes sub-queries to relevant sources and integrates the results
4- Federated Query Processing • DARQ (Distributed ARQ) • http://darq.sourceforge.net/ • Query engine for federated SPARQL queries • Extension of ARQ (query engine for Jena) • Last update: June 28, 2006 • Semantic Web Integrator and Query Engine(SemWIQ) • http://semwiq.sourceforge.net/ • Actively maintained!
4- Federated Query Processing • Advantages • No need for specific program logic • Queried data is up to date • Drawbacks • Requires the existence of a SPARQL endpoint for each dataset • Requires effort to set up and configure the mediator
In any case • You have to know the relevant data sources • When developing the app using follow-up queries • When selecting an existing SPARQL endpoint over a collection of dataset copies • When setting up your own store with a collection of dataset copies • When configuring your query federation system • You restrict yourself to the selected sources Automated Link Traversal Idea: Discover further data by looking up relevant URIs in your application Can be combined with the previous approaches
Link Traversal Based Query Execution • Applies the idea of automated link traversal to the execution of SPARQL queries • Idea: • Intertwine query evaluation with traversal of RDF links • Discover data that might contribute to query results during query execution • Alternately: • Evaluate parts of the query • Look up URIs in intermediate solutions
Link Traversal Based Query Execution • Advantages • No need to know all data sources in advance • No need for specific programming logic • Queried data is up to date • Does not depend on the existence of SPARQL endpoints provided by the data sources • Drawbacks • Not as fast as a centralized collection of copies • Unsuitable for some queries • Results might be incomplete (do we care?)
Implementations • Semantic Web Client library (SWClLib) for Java http://www4.wiwiss.fu-berlin.de/bizer/ng4j/semwebclient/ • SWIC for Prolog http://moustaki.org/swic/ • SQUIN http://squin.org • Provides SWClLib functionality as a Web service • Accessible like a SPARQL endpoint
What is a Linked Data application? Software system that makes use of data on the web from multiple datasets and that benefits from links between the datasets
Characteristics of Linked Data Applications • Consumedata that is published on the web following the Linked Data principles: an application should be able to request, retrieve and process the accessed data. • Discoverfurther information by following the links between different data sources: the fourth principle enables this. • Combinethe consumed linked data with data from sources (not necessarily Linked Data). • Exposethe combined data back to the web following the Linked Data principles. • Offer valueto end-users.