1 / 36

An Efficient SQL-based RDF Querying Scheme Eugene Inseok Chong Souripriya Das

An Efficient SQL-based RDF Querying Scheme Eugene Inseok Chong Souripriya Das George Eadon Jagannathan Srinivasan New England Development Center Oracle. Talk Outline. Introduction Functionality Design and Implementation Performance

kcharles
Download Presentation

An Efficient SQL-based RDF Querying Scheme Eugene Inseok Chong Souripriya Das

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Efficient SQL-based RDF Querying Scheme Eugene Inseok Chong Souripriya Das George Eadon Jagannathan Srinivasan New England Development CenterOracle

  2. Talk Outline • Introduction • Functionality • Design and Implementation • Performance • Conclusions and Future Work

  3. Introduction

  4. RDF (Resource Description Framework) • RDF is a W3C Standard for describing resources on the web • Uniform Resource Identifiers (URIs) are used to identify resources • Example: http://www.oracle.com/people#John • RDF triples are used to make statements about a resource • Format: (subject predicate object) • Example:(:John :brotherOf :Mary) • Represents a directed, labeled edge in an RDF graph: :brotherOf :John :Mary

  5. RDF Data and Graph Example John :name Family Data: (:John :brotherOf :Mary) (:Mary :parentOf :Matt) (:John :name “John”) (:Mary :name “Mary”) (:Matt :name “Matt”) :John :brotherOf :parentOf :Mary :Matt :name :name Mary Matt

  6. RDF Querying Problem • Given • RDF graphs: the data set to be searched • Graph Pattern: containing a set of variables • Find • Matching Subgraphs • Return • Sets of variable bindings: where each set corresponds to a Matching Subgraph

  7. RDF Query Example John :name Family Data: (:John :brotherOf :Mary) (:Mary :parentOf :Matt) (:John :name “John”) (:Mary :name “Mary”) (:Matt :name “Matt”) Graph Pattern: (names of Mary’s brothers) (?x :brotherOf ?y) (?y :name “Mary”) (?x :name ?n) Variable Bindings: x = :John, y = :Mary, n = “John” Matching Subgraph: (:John :brotherOf :Mary) (:Mary :name “Mary”) (:John :name “John”) :John :brotherOf :parentOf :Mary :Matt :name :name Mary Matt

  8. RDF Storage Issues • Need to store RDF <subject, predicate, object> triples where the individual components can be URIs, blank nodes, or literals • Namespaces used in URIs could be long • Multiple triples describe a resource resulting in repetition of (possibly long) URIs • Different representations possible for a literal occurring in multiple triples • e.g. 120 120.0 12.0e+1 1.20e+2 • RDF graph may include schema triples • e.g. (:brotherOf rdfs:domain :Male)

  9. RDF Querying Issues in SQL • Support specification of graph pattern-based SQL query • Occurrence of same variables in multiple triples of graph pattern: Processing requires self-join • e.g. (?x :brotherOf ?y) (?y :name “Mary”) (?x :name ?n) • Query processing (e.g for filter conditions, ORDER BY) requires datatype-specific comparison semantics Schema Triple: (:age rdfs:range xsd:int) Graph Pattern:(?x :age ?a) Filter Condition:a > 60 ORDER BY:a DESCENDING

  10. RDF Querying Issues: Inference • Query processing may involve Inferencing. • Example: Data: (:Jim :brotherOf :John) (:John :fatherOf :Mary) Graph Pattern: (?x :uncleOf ?y) Result: Empty Rule: (?x :brotherOf ?y) (?y :fatherOf ?z)  (?x :uncleOf ?z) Inferred data:(:Jim :uncleOf :Mary) Result: x = :Jim, y = :Mary

  11. RDF Querying Approach • General Approach • Create a new (declarative, SQL-like) query language • e.g.: RQL, SeRQL, TRIPLE, N3, Versa, SPARQL, RDQL, RDFQL, SquishQL, RSQL, etc. • SQL-based Approach • Introduces a SQL Table FunctionRDF_MATCH that uses SPARQL-like graph pattern to express RDF queries • Benefits of SQL-based Approach • Leverages all the powerful constructs in SQL (e.g., SELECT / FROM / WHERE, ORDER BY, GROUP BY, aggregates, Join) to process graph query results • RDF queries can easily be combined with conventional queries on database tables thereby avoiding staging

  12. Embedding RDF Query in SQL • SELECT …FROM …, TABLE ( ) t, …WHERE …; • Use of RDF_MATCHTable Function allows embedding a graph query in a SQL query RDF Query(expressed as RDF_MATCH Table Function invocation)

  13. Functionality

  14. RDF_MATCH Table Function • Input parameters RDF_MATCH (Pattern,  graph patternModels,  Data (set of RDF graphs)RuleBases,  Rules (0 or more rulebases)Aliases  list of prefixes for namespaces) • Returns a set of columns containing variable bindings • Variable matching URI returned as single VARCHAR2 column with the same name (e.g. x for ?x) • Variable matching literal returned as a pair of VARCHAR2 columns with a name (e.g. x for ?x) and the type (x$type for ?x)

  15. RDF_MATCH Example • Example: student reviewers less than 25 years old SELECT t.r reviewer, t.c conf, t.a age FROM TABLE ( RDF_MATCH ( ‘(?r rdf:type :Student) (?r :reviewerOf ?c) (?r :age ?a)’, RDFModels(‘reviewers’), NULL, RDFAliases(…)) ) t WHERE t.a < 25;

  16. Specifying Rules • RDFS rulebase: Pre-Loaded • Can add User-defined rules • Rule: “Chairperson of Conference is also a reviewer” (‘rb’,  rulebase name ‘ChairpersonRule’,  rule name ‘(?r :ChairpersonOf ?c)’  antecedents NULL,  filter condition NULL,  aliases ‘(?r :ReviewerOf ?c)’)  consequents

  17. RDF_MATCH Example with rulebase • Query: Find reviewers of conferences • SELECT t.r reviewer FROM TABLE( RDF_MATCH( ‘(?r :ReviewerOf ?c)’, RDFModels (‘reviewers’), RDFRules (‘rb’), NULL)) t; • Data(:Mary :ChairpersonOf :IDBC2005) • Inferred data(:Mary :ReviewerOf :IDBC2005)

  18. Design & Implementation

  19. RDF Data Storage • Triples Data stored after normalization in two tables • UriMap(UriID, UriValue,…) contains mapping of (URIs, blank nodes, literals) to internal identifiers • IdTriples (ModelID, SubjectID, PropertyID, ObjectID,…) contains the triple information encoded as three identifiers • Multiple representation of literals: The first occurrence treated as canonical, rest mapped to canonical representation • e.g. 120.0  120 1.20e+2 12.0e+1

  20. RDF_MATCH Query Processing • Subsititute aliases with namespaces in search pattern • Convert URIs and literals to internal IDs • Generate Query • Generate self-join query based on matching variables • Generate SQL subqueries for rulebases component (if any) • Generate the join result by joining internal IDs with UriMap table • Use model IDs to restrict IdTriples table • Compile and Execute the generated query

  21. Optimization: Table Function Rewrite • TableRewriteSQL( ) • Takes RDF Query (specified via arguments) as input • generates a SQL string • Substitute the table function call with the generated SQL string • Reparse and execute the resulting query • Advantages • Avoid execution-time overhead (linear in number of result rows) associated with table function infrastructure • Leverage SQL optimizer capabilities to optimize the resulting query (including filter condition pushdown)

  22. Optimization: Materialized Join Views • Generic Materialized Join views (MJVs) • Subject-Subject, Object-Subject, … • Subject-property matrix MJVs (SPMJVs) • custom, workload based (e.g., frequent search patterns) Example: Select student name, university, and age • Select r, u, a …… ‘(?r rdf:type :Student) (?r :enrolledAt ?u) (?r :age ?a)’ …… • SPMJV: < Student enrolledAt age >

  23. Performance

  24. Dataset • WordNet : lexical database for English language • UniProt : large scale (80 million triples) • Protein and annotation data

  25. Experiments • Varying number of triples in search pattern • Varying filter conditions • Varying projection list • Large-scale RDF data • Subject-property MJVs

  26. Varying Number of Triples • ‘(?a wn:hyponymOf ?b) (?b wn:hyponymOf ?c) ….. • Increasing number of self-joins

  27. Varying Number of Triples

  28. Varying Projection List • ‘(?c0 wn:wordForm ?word) (?c0 wn:wordForm ?syn1) (?c1 wn:wordForm ?syn1) …. (5 triples) • Benefit of the projection list optimization • Eliminate joins with UriMap table for variables not referenced outside of RDF_MATCH

  29. Varying Projection List

  30. Large-Scale RDF Data • UniProt – 10M, 20M, 40M, 80M triples • 6 example queries given with UniProt • Number of matches remain constant as dataset size changes (ROWNUM)

  31. UniProt Sample Queries • Description • Pattern • Projection • Result limit • Q1:Display the ranges of transmembrane regions • 6 triples5 vars • 3 vars • 15000 rows • Q2: List proteins with publications by authors with matching names • 5 triples5 vars 1 LIKE pred. • 3 vars • 10 rows • Q3: Count the number of times a publication by a specific author is cited • 3 triples2 vars • 0 vars • 32 rows • Q4: List resources that are related to proteins annotated with a specific keyword • 3 triples2 vars • 1 var • 3000 rows • Q5: List genes associated with human diseases • 7 triples5 vars • 3 vars • 750 rows • Q6:List recently modified entries • 2 triples2 vars1 range pred. • 2 vars • 8000 rows

  32. RDF_MATCH Performance Scalability • Q1 • Q2 • Q3 • Q4 • Q5 • Q6 • 10 M Triples • 0.86 • < 0.01 • < 0.01 • 0.03 • 0.18 • 0.46 • 20 M Triples • 0.95 • < 0.01 • < 0.01 • 0.03 • 0.19 • 0.47 • 40 M Triples • 0.96 • < 0.01 • < 0.01 • 0.03 • 0.18 • 0.47 • 80 M Triples • 1.03 • < 0.01 • < 0.01 • 0.03 • 0.20 • 0.49 • Maximum  • .054 • 0.002 • 0.002 • .011 • .065 • 0.07 Query Response Times

  33. Conclusions

  34. Conclusions and Future Work • SQL-based RDF querying scheme • RDF_MATCH table function • Supports graph-pattern based query on RDF data with RDFS and user-defined rules • Efficient Execution • Table Function Rewrite • Materialized Join Views: Generic and Subject-Property • Rule Indexes • Future work • OPTIONAL support – outer-join • Provenance support

More Related