1 / 59

Linked Data at present Using Linked Data

Linked Data at present Using Linked Data. Ferdowsi University of Mashhad Web Technology Lab. ( WTLab ), www.wtlab.um.ac.ir Linked Data Group (LDG ). Mahboubeh Dadkhah May 11, 2011. You may know the Linked Data. History. Linked Data Design Issues by TimBL July 2006

wesley
Download Presentation

Linked Data at present Using Linked Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linked Data at presentUsing Linked Data Ferdowsi University of Mashhad Web Technology Lab. (WTLab), www.wtlab.um.ac.ir Linked Data Group (LDG) MahboubehDadkhah May 11, 2011

  2. You may know the Linked Data

  3. History • Linked Data Design Issues by TimBL July 2006 • Linked Open Data Project WWW2007 • First LOD Cloud May 2007 • 1st Linked Data on the Web Workshop WWW2008 • 1stTriplification Challenge 2008 • How to Publish Linked Data Tutorial ISWC2008 • BBC publishes Linked Data 2008 • 2nd Linked Data on the Web Workshop WWW2009 • NY Times announcement SemTech2009 - ISWC09 • 1st Linked Data-a-thon ISWC2009 • 1st How to Consume Linked Data Tutorial ISWC2009 • Data.gov.uk publishes Linked Data 2010 • 2st How to Consume Linked Data Tutorial WWW2010 • 1st International Workshop on Consuming Linked Data COLD2010 • …

  4. May 2007

  5. Cloud statistics

  6. Now that the Linked Data is here What to do next? Let’s Make Use of It

  7. Linked Data • Before using we should be sure that we understand the meaning. • What was the problem: Searching and Finding Search for Football Players who went to the University of Texas at Austin, played for the Dallas Cowboys as Cornerback

  8. Current Web = internet + links + docs Why cant we find it?

  9. So, what to do? • Make it easy for computers/software to find THINGS Publish Thing • As data • In a standardized way: RDF • RDF data is serialized in different ways: • RDF/XML, RDFa, N3, Turtle, JSON

  10. hasReview http://…/review1 http://…/isbn978 Programming the Semantic Web description title hasReviewer sameAs Awesome Book author http://…/isbn978 Toby Segaran http://…/reviewer name isbn 978-0-596-15381-6 Juan Sequeda publisher sameAs http://…/publisher1 name O’Reilly http://juansequeda.com/id http://dbpedia.org/Austin livesIn name Juan Sequeda

  11. 2009’s Top 10 Linked Data Research Issues • Data Linking and Fusion • linking algorithms and heuristics, identity resolution • Web data integration and data fusion • evaluating quality and trustworthiness of Linked Data • Linked Data Application Architectures • crawling, caching and querying Linked Data on the Web; optimizations, performance • Linked Data browsers, search engines • applications that exploit distributed Web datasets • Data Publishing • tools for publishing large data sources as Linked Data on the Web (e.g. relational databases, XML repositories) • embedding data into classic Web documents (e.g. GRDDL, RDFa, Microformats) • licensing and provenance tracking issues in Linked Data publishing • business models for Linked Data publishing and consumption

  12. 2010’s Top 10 Linked Data Research Issues • Linked Data Application Architectures • crawling, caching and querying Linked Data • dataset dynamics and synchronization • Linked Data mining • Data Linking and Data Fusion • linking algorithms and heuristics, identity resolution • Web data integration and data fusion • link maintanance • performance of linking infrastructures/algorithms on Web data • Quality, Trust and Provenance in Linked Data • tracking provenance and usage of Linked Data • evaluating quality and trustworthiness of Linked Data • profiling of Linked Data sources • User Interfaces for the Web of Data • approaches to visualizing and interacting with distributed Web data • Linked Data browsers and search engines • Data Publishing • tools for publishing large data sources as Linked Data on the Web (e.g. relational databases, XML repositories) • embedding data into classic Web documents (e.g. RDFa, Microformats) • describing data on the Web (e.g. voiD, semantic site maps) • licensing issues in Linked Data publishing

  13. 2011’s Top 10 Linked Data Research Issues • Foundations of Linked Data • Web architecture and dataspace theory • dataset dynamics and synchronisation • analyzing and profiling the Web of Data • Data Linking and Fusion • entity consolidation and linking algorithms • Web-based data integration and data fusion • performance and scalability of integration architectures • Write-enabling the Web of Data • access authentication mechanisms for Linked Datasets (WebID, etc.) • authorisation mechanisms for Linked Datasets (WebACL, etc.) • enabling write-access to legacy data sources (Google APIs, Flickr API, etc.) • Data Publishing • publishing legacy data sources as Linked Data on the Web • cost-benefits of the 5 star LOD plan • Data Usage • tracking provenance of Linked Data • evaluating quality and trustworthiness of Linked Data • licensing issues in Linked Data publishing • distributed query of Linked Data • RDF-to-X, turning RDF to legacy data • Interacting with the Web of Data • approaches to visualising Linked Data • interacting with distributed Web data • Linked Data browsers, indexers and search engines

  14. Linked Data makes the web appear as ONEGIANTHUGEGLOBALDATABASE!

  15. Do you remember Search and Find ?

  16. SPARQL Endpoints Query Linked Data with • Linked Data sources usually provide a SPARQL endpoint for their dataset(s) • SPARQL endpoint: SPARQL query processing service that supports the SPARQL protocol* • Send your SPARQL query, receive the result * http://www.w3.org/TR/rdf-sparql-protocol/

  17. http://www.w3.org/wiki/SparqlEndpoints

  18. http://labs.mondeca.com/sparqlEndpointsStatus/

  19. SPARQL queries over multipledatasets How to do this? • Issue follow-up queries to different endpoints • Querying a central collection of datasets • Build store with copies of relevant datasets • Use query federation system

  20. 1- Follow-up Queries • Idea: issue follow-up queries over other datasets based on results from previous queries • Substituting placeholders in query templates

  21. String s1 = "http://cb.semsol.org/sparql"; String s2 = "http://dbpedia.org/sparql"; String qTmpl = "SELECT ?c WHERE{ <%s>rdfs:comment ?c }"; String q1 = "SELECT ?s WHERE { ..."; QueryExecution e1 = QueryExecutionFactory.sparqlService(s1,q1); ResultSet results1 = e1.execSelect(); while ( results1.hasNext() ) { QuerySolution s1 = results.nextSolution(); String q2 = String.format( qTmpl, s1.getResource("s"),getURI() ); QueryExecution e2= QueryExecutionFactory.sparqlService(s2,q2); ResultSet results2 = e2.execSelect(); while ( results2.hasNext() ) { // ... } e2.close(); } e1.close(); Find a list of companies Filtered by some criteria and return DbpediaURIs from them

  22. 1- Follow-up Queries • Advantage • Queried data is up-to-date • Drawbacks • Requires the existence of a SPARQL endpoint for each dataset • Requires program logic • Very inefficient

  23. 2- Querying a Collection of Datasets • Idea: Use an existing SPARQL endpoint that provides access to a set of copies of relevant datasets • Example: • SPARQL endpoint over a majority of datasets from the LOD cloud at: http://uberblic.org http://lod.openlinksw.com/sparql

  24. (Linked) Data Marketplaces • FactForge • Integrates some of the most central LOD datasets • General-purpose information(not specific to a domain) • 1.2billion explicit and 1 billion inferred statements • The largest upper-level knowledge base • http://www.FactForge.net • LinkedLifeData • 25 of the most popular life-science datasets • 2.7billion explicit and 1.4 billion inferred statements • http://www.LinkedLifeData.com

  25. 2- Querying a Collection of Datasets • Advantage • No need for specific program logic • Drawbacks • Queried data might be out of date • Not all relevant datasets in the collection

  26. 3- Own Store of Dataset Copies • Idea: Build your own store with copies of relevant datasets and query it • Possible stores: • Jena TDB http://jena.hpl.hp.com/wiki/TDB • Sesame http://www.openrdf.org/ • OpenLink Virtuoso http://virtuoso.openlinksw.com/ • 4store http://4store.org/ • AllegroGraphhttp://www.franz.com/agraph/ • etc.

  27. 3- Own Store of Dataset Copies • Advantages • No need for specific program logic • Can include all datasets • Independent of the existence, availability, and efficiency of SPARQL endpoints • Drawbacks • Requires effort to set up and to operate the store • Ideally, data sources provide RDF dumps; if not? • How to keep the copies in sync with the originals? • Queried data might be out of date

  28. 4- Federated Query Processing • Idea: Querying a mediator which distributes sub-queries to relevant sources and integrates the results

  29. 4- Federated Query Processing • DARQ (Distributed ARQ) • http://darq.sourceforge.net/ • Query engine for federated SPARQL queries • Extension of ARQ (query engine for Jena) • Last update: June 28, 2006 • Semantic Web Integrator and Query Engine(SemWIQ) • http://semwiq.sourceforge.net/ • Actively maintained!

  30. 4- Federated Query Processing • Advantages • No need for specific program logic • Queried data is up to date • Drawbacks • Requires the existence of a SPARQL endpoint for each dataset • Requires effort to set up and configure the mediator

  31. In any case • You have to know the relevant data sources • When developing the app using follow-up queries • When selecting an existing SPARQL endpoint over a collection of dataset copies • When setting up your own store with a collection of dataset copies • When configuring your query federation system • You restrict yourself to the selected sources Automated Link Traversal Idea: Discover further data by looking up relevant URIs in your application Can be combined with the previous approaches

  32. Link Traversal Based Query Execution • Applies the idea of automated link traversal to the execution of SPARQL queries • Idea: • Intertwine query evaluation with traversal of RDF links • Discover data that might contribute to query results during query execution • Alternately: • Evaluate parts of the query • Look up URIs in intermediate solutions

  33. Link Traversal Based Query Execution

  34. Link Traversal Based Query Execution

  35. Link Traversal Based Query Execution

  36. Link Traversal Based Query Execution

  37. Link Traversal Based Query Execution

  38. Link Traversal Based Query Execution

  39. Link Traversal Based Query Execution

  40. Link Traversal Based Query Execution

  41. Link Traversal Based Query Execution

  42. Link Traversal Based Query Execution

  43. Link Traversal Based Query Execution • Advantages • No need to know all data sources in advance • No need for specific programming logic • Queried data is up to date • Does not depend on the existence of SPARQL endpoints provided by the data sources • Drawbacks • Not as fast as a centralized collection of copies • Unsuitable for some queries • Results might be incomplete (do we care?)

  44. Implementations • Semantic Web Client library (SWClLib) for Java http://www4.wiwiss.fu-berlin.de/bizer/ng4j/semwebclient/ • SWIC for Prolog http://moustaki.org/swic/ • SQUIN http://squin.org • Provides SWClLib functionality as a Web service • Accessible like a SPARQL endpoint

  45. Real World Example

  46. What is a Linked Data application? Software system that makes use of data on the web from multiple datasets and that benefits from links between the datasets

  47. Characteristics of Linked Data Applications • Consumedata that is published on the web following the Linked Data principles: an application should be able to request, retrieve and process the accessed data. • Discoverfurther information by following the links between different data sources: the fourth principle enables this. • Combinethe consumed linked data with data from sources (not necessarily Linked Data). • Exposethe combined data back to the web following the Linked Data principles. • Offer valueto end-users.

More Related