1 / 43

Paths towards the Sustainable Consumption of Semantic Data on the Web

Paths towards the Sustainable Consumption of Semantic Data on the Web. Aidan Hogan & Claudio Gutierrez. Growth of the Linked Data Cloud. Oct. 2007 . Growth of the Linked Data Cloud. Oct. 2007 Nov. 2007. Growth of the Linked Data Cloud. Oct. 2007 Nov. 2007 Feb. 2008.

alize
Download Presentation

Paths towards the Sustainable Consumption of Semantic Data on the Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Paths towards the Sustainable Consumptionof Semantic Data on the Web Aidan Hogan & Claudio Gutierrez

  2. Growth of the Linked Data Cloud Oct. 2007

  3. Growth of the Linked Data Cloud Oct. 2007 Nov. 2007

  4. Growth of the Linked Data Cloud Oct. 2007 Nov. 2007 Feb. 2008

  5. Growth of the Linked Data Cloud Oct. 2007 Nov. 2007 Feb. 2008 Sep. 2008

  6. Growth of the Linked Data Cloud Oct. 2007 Nov. 2007 Feb. 2008 Sep. 2008 Mar. 2009

  7. Growth of the Linked Data Cloud Oct. 2007 Nov. 2007 Feb. 2008 Sep. 2008 Mar. 2009 July 2009

  8. Growth of the Linked Data Cloud Oct 2007 Oct. 2007 Nov. 2007 Feb. 2008 Sep. 2008 Mar. 2009 July 2009 Sept. 2010

  9. Growth of the Linked Data Cloud Oct. 2007 Nov. 2007 Feb. 2008 Sep. 2008 Mar. 2009 July 2009 Sept. 2010 Sept. 2011

  10. Growth of the Linked Data Cloud Oct. 2007 Nov. 2007 Feb. 2008 Sep. 2008 Mar. 2009 July 2009 Sept. 2010 Sept. 2011 Sept. 2012

  11. Growth of the Linked Data Cloud Oct. 2007 Nov. 2007 Feb. 2008 Sep. 2008 Mar. 2009 July 2009 Sept. 2010 Sept. 2011 Sept. 2012 Sept. 2013

  12. Growth of the Linked Data Cloud Oct. 2007 Nov. 2007 Feb. 2008 Sep. 2008 Mar. 2009 July 2009 Sept. 2010 Sept. 2011 Sept. 2012 Sept. 2013 June 2014

  13. Growth of the Linked Data Cloud One new dataset in the past year

  14. In loving memory of Linked Data 2007–2012 Survived by its research community

  15. Access Methods • Client has a request/query Q • Server has a dataset D • Client issues Q to server • Server computes and returns response Q(D) D Q HTTP Q(D)

  16. Access Methods Q D • Multiple clients / multiple servers (blurred) • Remote, decentralised, uncoordinated • Web scale D Q D Q D Q Anything else like this?

  17. Access Methods (over the web) “The Web is not a database” -- Alberto Mendelzon, slides from a presentation at the Workshop on Internet Data Management, Washington, November 1998.

  18. Linked Data Access Methods • Dereferencing: • Look up a URI, get an RDF document • Dumps: • Get all data in an archive • SPARQL Queries: • Send a query, get the answers

  19. Dereferencing

  20. Dereferencing (what is it?) Q = “http://dbpedia.org/resource/Columbia” Q(D) =

  21. Dereferencing (what’s wrong with it?) • Responses vary from server to server • local triples where URI is subject (83%) vs. • local triples where URI is subject or object (55%) [Hogan et al. 2012] Well-defined: For a given Q and D, clients and servers agree on what Q(D) should be.

  22. Dereferencing (what’s wrong with it?) • Very coarse: • Give me all capitals of South American countries. • Dereference documents for all country URIs • See which ones are in South America, throw away rest • Throw away triples other than capitals Granular: The language for Q allows the client to be specific so as to avoid wasting bandwidth

  23. Dereferencing (what’s wrong with it?) • No pagination: • Give me some information about Italy. • Load document with 100,000 triples • Throw away 99,900 triples the user won’t read Pagination: A large response Q(D) can be split into chunks

  24. DUMPS

  25. Dumps (what are they?)

  26. Dumps (what’s wrong with them?) [Fernández et al. 2012] • 15× compression for RDF achievable • But same weaknesses as for deref. still apply

  27. Dumps (what’s wrong with them?) [Fernández et al. 2012] • 15× compression for RDF achievable • But same weaknesses as for deref. still apply • Also, no standard access methods: • Various compressions and formats • Linked through generic HTML Accessible: The protocol and formats are defined for automatic access by software agents

  28. SPARQL

  29. SPARQL (what is it?) “What’s the name of the query language again … what’s that? Oh yes, SPARQL.” -- C. Mohan, June 2014, Cartagena, AMW Summer School

  30. SPARQL (what is it?) PREFIX dbo: <http://dbpedia.org/ontology/> ... SELECT ?capital WHERE { ?s a dbo:Country ; dbp:capital?c ; dcterms:subjectcategory:Countries_in_South_America . ?c rdfs:label ?capital . FILTER (lang(?capital)="en") } Q = Q(D) =

  31. SPARQL (to the rescue?)

  32. SPARQL (what’s wrong with it?) [BuilAranda et al. 2012]

  33. SPARQL (what’s wrong with it?) [BuilAranda et al. 2012]

  34. SPARQL (what’s wrong with it?) • SimpleProtocol and RDF Query Language • SPARQL evaluation: PSpace-complete • SPARQL 1.1 even more complex [Perez et al. 2009] [Arenas et al. 2013]

  35. SPARQL (what’s wrong with it?) Cacheable: Common requests can be cached and re-used. Queries can be anticipated. Costable: The cost of processing a query can be anticipated before actual processing.

  36. SPARQL (what’s wrong with it?)

  37. SPARQL (what’s wrong with it?)

  38. SPARQL (what’s wrong with it?) • Simple ProtocolAnd RDF Query Language • Protocol always expects a perfect answer • No support for partial results, timeouts, exception handling, pagination … Robust: Access method can gracefully handle error cases and provide Quality of Service

  39. SPARQL (what’s wrong with it?) • D is a black-box for the user

  40. SPARQL (what’s wrong with it?) Q = Q(D) = PREFIX dbo: <http://dbpedia.org/ontology/> SELECT (COUNT(?c) as ?count) WHERE { ?c a dbo:Country . } Transparent: The client can determine if a dataset D is relevant and the service sufficient.

  41. WRAP-UP

  42. Problem Categories • Standardised • Bandwidth-efficient • Server-processing-efficient • Usable by client

  43. Conclusion • Data access methods crucial • Access = Protocol + Query Language • But current ones don’t work → need something else (or multiple things?) • Open question:

More Related