1 / 33

gStore: Answering SPARQL Queries Via Subgraph Matching

gStore: Answering SPARQL Queries Via Subgraph Matching. Presented by Guan Wang Kent State University October 24, 2011. Outline. RDF & SPARQL Previous Solutions for SPARQL Queries Overview of gStore Encoding Technique VS*-tree & Query Algorithm Experiments Conclusions. Outline.

verne
Download Presentation

gStore: Answering SPARQL Queries Via Subgraph Matching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. gStore: Answering SPARQL Queries Via Subgraph Matching Presented by Guan Wang Kent State University October 24, 2011

  2. Outline RDF & SPARQL Previous Solutions for SPARQL Queries Overview of gStore Encoding Technique VS*-tree & Query Algorithm Experiments Conclusions

  3. Outline RDF & SPARQL Previous Solutions for SPARQL Queries Overview of gStore Encoding Technique VS*-tree & Query Algorithm Experiments Conclusions

  4. Predicate Statement What is RDF • A general-purpose framework provides structured, machine-understandable metadata for the Web • It is based upon the idea of making statements about resources in the form ofsubject-predicate-object expressions. These expressions are known as triples in RDF. Subject Object

  5. RDF Model Example Guan Creator page.html Guan’s Home Page Title Subject Predicate Object page.html Creator Guan page.html Creator Guan's Home Page

  6. What is SPARQL • SPARQL is a query language for RDF. It provides a standard format for writing queries that target RDF data and a set of standard rules for processing those queries and returning the results. • The building blocks of a SPARQL queries are graph patterns that include variables. The result of the query will be the values that these variables must take to match the RDF graph.

  7. Example of SPARQL Select ?name Where { ?m <hasName> ?name. ?m <BornOnDate> “1809-02-12”. ?m <DiedOnDate> “1865-04-15”. } • Names beginning with a ? or a $ are variables. • Graph patterns are given as a list of triple patterns enclosed within braces {} • The variables named after the SELECT keyword are the variables that will be returned as results. (~SQL) • Here each of the conjunctions, denoted by a dot, corresponds to a join.

  8. RDF Graph

  9. SPARQL Queries SPARQL Query: Select ?name Where { ?m <hasName> ?name. ?m <BornOnDate> “1809-02-12”. ?m <DiedOnDate> “1865-04-15”. } Query Graph

  10. Subgraph Match vs. SPARQL Queries

  11. Outline RDF & SPARQL Previous Solutions for SPARQL Queries Overview of gStore Encoding Technique VS*-tree & Query Algorithm Experiments Conclusions

  12. Existing Solutions-Three Column Table SPARQL Query: Select ?name Where { ?m <hasName> ?name. ?m <BornOnDate> “1809-02-12”. ?m <DiedOnDate> “1865-04-15”. } Shortage: Too Many Self-Joins

  13. Existing Solutions-Property Table Shortage: A Big Waste of Space

  14. Existing Solutions-Vertically Partitioned Shortage: Too Many Merge Joins

  15. Existing Solutions-RDF-3x Utilize the characteristic of RDF, that there are only three elements(subject, object and predicate) in RDF. Construct all six possible indexes and optimalize merge orders. Shortage: Different to Handle Updates

  16. Outline RDF & SPARQL Previous Solutions for SPARQL Queries Overview of gStore Encoding Technique VS*-tree & Query Algorithm Experiments Conclusions

  17. Overview of gStore(Store) • Represent an RDF dataset by an RDF graph G and store it by its adjacency list table.

  18. Overview of gStore(Encoding) • Encode each entity and class vertex into a bitstring, called signature. • Link these vertex signatures to form a data signature graph G according to RDF graph’s structure

  19. Overview of gStore(VS*-tree)

  20. Outline RDF & SPARQL Previous Solutions for SPARQL Queries Overview of gStore Encoding Technique VS*-tree & Query Algorithm Experiments Conclusions

  21. Encoding Technique

  22. Encoding Technique

  23. Outline RDF & SPARQL Previous Solutions for SPARQL Queries Overview of gStore Encoding Technique VS*-tree & Query Algorithm Experiments Conclusions

  24. VS*-tree • Each leafnode of the tree corresponds to one vertexsignature in G. • Given two leaf nodes d1 and d2 in the tree, we introduce anedge between them, if and only if there is an edgebetween d1 andd2 in G • Given nodes d1and d2in the tree, we introducea super edge from d1to d2, if and only if there is at least oneedge from d1’s children to d2’s children. • Assign an edge label for the edged1→d2 by performingbitwise “OR” over these n edge labels from d1’s children to d2’schildren.

  25. VS*-tree

  26. Query Algorithm

  27. Outline RDF & SPARQL Previous Solutions for SPARQL Queries Overview of gStore Encoding Technique VS*-tree & Query Algorithm Experiments Conclusions

  28. Experiments • Used datasets: Yago, DBLP which are popular semantic datasets with millions of triples. • Data size: approximately 4GB.

  29. Experiments(Exact Queries)

  30. Experiments(Wildcard Queries)

  31. Outline RDF & SPARQL Previous Solutions for SPARQL Queries Overview of gStore Encoding Technique VS*-tree & Query Algorithm Experiments Conclusions

  32. Conclusions • Propose to store and query RDF data from graph database perspective. • Using VS*-tree as indexing method for bitstring of vertices, which supports the SPARQL queries in a scalable manner. • False positive.

  33. Reference • [ICDE09]Thanh Tran, Haofen Wang, Sebastian Rudolph, Philipp Cimiano, "Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data", DOI 10.1109/ICDE.2009.119. • [VLDB07]Daniel J. Abadi, Adam Marcus, Samuel R. Madden,Kate Hollenbach, "Scalable Semantic Web Data Management Using Vertical Partitioning", VLDB ‘07, September 2328, 2007, Vienna, Austria. • [PVLDB08]Cathrin Weiss, Panagiotis Karras, Abraham Bernstein, "Hexastore:Sextuple Indexing for Semantic Web Data Management",PVLDB '08, August 23-28, 2008, Auckland, New Zealand • [PVLDB08]Thomas Neumann, Gerhard Weikum, "RDF3X:a RISCstyle Engine for RDF",PVLDB '08, August 23-28, 2008, Auckland, New Zealand • [VLDB11]Lei Zou, Jinghui Mo, Lei Chen, M. Tamer O¨ zsu, Dongyan Zhao, "gStore: Answering SPARQL Queries via Subgraph Matching" VLDB‘11,August 29th - September 3rd 2011, Seattle, Washington. Thank you!

More Related