450 likes | 465 Views
This study focuses on the problem of rewriting queries on SPARQL views in RDF data and explores query optimization techniques. It examines the challenges of virtual views, maintenance, and evaluation, and proposes solutions to improve query performance.
E N D
Rewriting Queries on SPARQL Views Le, Wangchao, et al. "Rewriting queries on SPARQL views." Proceedings of the 20th international conference on World wide web. ACM, 2011. 2019/7/16 machida
Author • Wangchao Le • University of Utah • Doctor of Philosophy • About him • He hold a Ph.D degree from the University of Utah, specialized in data management and analysis. His dissertation is about supporting scalable data analytic on large linked data • Work Synopsis • Cloud query optimization • Database systems and large-scale data management • Query processing and optimization for large graph data • Temporal and multi-version databases • Distributed key-value store • Database security
Introduction • Answering queries over virtual views is encountered in many settings. • Access control. • Data integration. • For example, DBAs use views to enforce security clearance. • Views are virtual. • Maintenance, e.g. ACID. • Rewriting is NP-hard. • Views are materialized. • Evaluation. • Space, updates, etc.
Introduction: Problem definition • Answering query using views is well studied for relation DBs, but not well understood in RDF and SPARQL • Given a set definitions of views V = {V1, V2,....,Vn} on a RDF graph G and SPARQL query Q over the views V, compute a SPARQL query Q’ such that Q’(G) = Q(V(G))
Introduction: RDF data and SPARQL language • Resource Description Framework(RDF) is commonly used in semantic web for knowledge representation • DBpedia extracted unstructured information and published in queryable structured format(RDF) • Unit of RDF data: linked triple
Introduction: RDF data and SPARQL language • Authors are interested in SPARQL queries of the form Q := (SELECT|CONSTRUCT) RD (WHERE GP) • Given an RDF graph G and a graph pattern GP, Q searches for subgraphs in G that match GP • RD is the result description • For SELECT, it projects the values of variables(like SQL) • For CONSTRUCT, it is a set of triple templates that constructs a new RDF graph from the matched subgraphs in G • Finally, there is a Boolean SPARQL query of form: ASK GP, which indicates if GP exists, or not, in G
Introduction: A running example • In SPARQL, define view by CONSTRUCT • Eric wants to hide his RoF, FoR and any distinction between FoF(RoR) and F(R) V: CONSTRUCT{ ?f0vfriend ?f1, ?f1vlives ?l1} WHERE{ ?f0 name Eric ?f0 friend ?f1, ?f1 lives ?l1}
Introduction: A running example • In SPARQL, define view by CONSTRUCT • Eric wants to hide his RoF, FoR and any distinction between FoF(RoR) and F(R) V: CONSTRUCT{ ?f0vfriend ?f1, ?f1vlives ?l1} WHERE{ ?f0 name Eric ?f0 friend ?f1, ?f1 lives ?l1}
Introduction: A running example • In SPARQL, define view by CONSTRUCT • Eric wants to hide his RoF, FoR and any distinction between FoF(RoR) and F(R) VFoF: CONSTRUCT{ ?f2vfriend ?f4, ?f4vlives ?l4} WHERE{ ?f2 name Eric ?f2 friend ?f3, ?f3 friend ?f4, ?f4 lives ?l4} VF: CONSTRUCT{ ?f0vfriend ?f1, ?f1vlives ?l1} WHERE{ ?f0 name Eric ?f0 friend ?f1, ?f1 lives ?l1}
Introduction: A running example • In SPARQL, define view by CONSTRUCT • Eric wants to hide his RoF, FoR and any distinction between FoF(RoR) and F(R) VFoF: CONSTRUCT{ ?f2vfriend ?f4, ?f4vlives ?l4} WHERE{ ?f2 name Eric ?f2 friend ?f3, ?f3 friend ?f4, ?f4 lives ?l4} VF: CONSTRUCT{ ?f0vfriend ?f1, ?f1vlives ?l1} WHERE{ ?f0 name Eric, ?f0 friend ?f1, ?f1 lives ?l1} VR: CONSTRUCT{ ?r0vrelated ?r1, ?r1vlives ?l1} WHERE{ ?r0 name Eric ?r0 related ?r1, ?r1 lives ?l1}
Introduction: A running example • In SPARQL, define view by CONSTRUCT • Eric wants to hide his RoF, FoR and any distinction between FoF(RoR) and F(R) VFoF: CONSTRUCT{ ?f2vfriend ?f4, ?f4vlives ?l4} WHERE{ ?f2 name Eric ?f2 friend ?f3, ?f3 friend ?f4, ?f4 lives ?l4} VF: CONSTRUCT{ ?f0vfriend ?f1, ?f1vlives ?l1} WHERE{ ?f0 name Eric, ?f0 friend ?f1, ?f1 lives ?l1} VR: CONSTRUCT{ ?r0vrelated ?r1, ?r1vlives ?l1} WHERE{ ?r0 name Eric ?r0 related ?r1, ?r1 lives ?l1} VRoR: CONSTRUCT{ ?r2vrelated ?r4, ?r4vlives ?l4} WHERE{ ?r2 name Eric, ?r2 related ?r3, ?r3 related ?r4, ?r4 lives ?l4}
Introduction: A running example • In SPARQL, define view by CONSTRUCT • Eric wants to hide his RoF, FoR and any distinction between FoF(RoR) and F(R) VFoF: CONSTRUCT{ ?f2vfriend ?f4, ?f4vlives ?l4} WHERE{ ?f2 name Eric ?f2 friend ?f3, ?f3 friend ?f4, ?f4 lives ?l4} VF: CONSTRUCT{ ?f0vfriend ?f1, ?f1vlives ?l1} WHERE{ ?f0 name Eric, ?f0 friend ?f1, ?f1 lives ?l1} VR: CONSTRUCT{ ?r0vrelated ?r1, ?r1vlives ?l1} WHERE{ ?r0 name Eric ?r0 related ?r1, ?r1 lives ?l1} VRoR: CONSTRUCT{ ?r2vrelated ?r4, ?r4vlives ?l4} WHERE{ ?r2 name Eric, ?r2 related ?r3, ?r3 related ?r4, ?r4 lives ?l4} Qu: SELECT ?f5 WHERE {?p vfriend?f5 , ?f5vlives ?l5 , ?p vrelated?r5, ?r5vlives ?l5.}
Introduction: Attempting a relational/SQL rewriting figure1 • Someone might be tempted to address the SPARQL rewriting problem by considering the corresponding SQL setting and applying the solutions in SQL • Moving from SPARQL to SQL does not reduce the complexity of the problem figure2
Introduction: Contributions • Authors study the rewriting of SPARQL queries over virtual SPARQL views, and propose a native SPARQL rewriting algorithm, and prove that it generates sound and complete rewritings • Authors propose several optimization of the basic rewriting algorithm to reduce the complexity and size of the rewritten queries • Authors present extensive experiments on two RDF stores on the scalability and portability of their algorithms
Query rewriting in SPARQL • Given a set definitions of views V = {V1, V2,....,Vn} on a RDF graph G and SPARQL query Q over the views V, compute a SPARQL query Q’ such that Q’(G) = Q(V(G)) • Authors consider two criteria on the correctness of a rewriting • Soundness: The rewriting is sound iffQ’(G) ⊂ Q(V(G)) • Completeness: The rewriting is complete iffQ(V(G)) ⊂ Q’(G)
Rewriting Algorithm • Two steps to rewrite • determines, for each triple pattern pi(Xi) user query, the set CandViof candidate views that have a variable mapping to this triple pattern(line3-18) • construct the rewriting as a union of conjuctive queries(line19-23)
Rewriting Algorithm Building CandV2 Building CandV1 CandV4 CandV2 CandV3 CandV1 Qu: SELECT ?f5WHERE {?p vfriend?f5 ?f5vlives ?l5?p vrelated?r5?r5vlives ?l5.} [V2F,?f1?f5, ?l1 ?l5] [V1R,?r0?p, ?r1 ?r5] [V1F,?f0 ?p, ?f1 ?f5] [V2F,?f1?r5, ?l1 ?l5] [V1RoR,?r2?p, ?r4 ?r5] [V2FoF,?f4?f5, ?l4 ?l5] [V2FoF,?f4?r5, ?l4 ?l5] [V1FoF,?f2 ?p, ?f4 ?f5] [V2R,?r1?r5, ?l3 ?l5] [V2R,?r1?f5, ?l3 ?l5] Φ121 [V2RoR,?r4?r5, ?l6 ?l5] [V2RoR,?r4?f5, ?l6 ?l5] VR: CONSTRUCT{ ①?r0vrelated ?r1, ②?r1vlives ?l3} WHERE{ ?r0 name Eric ?r0 related ?r1, ?r1 lives ?l3} VF: CONSTRUCT{ ①?f0vfriend ?f1, ②?f1vlives ?l1} WHERE{ ?f0 name Eric, ?f0 friend ?f1, ?f1 lives ?l1} VFoF: CONSTRUCT{ ①?f2vfriend ?f4, ②?f4vlives ?l4} WHERE{ ?f2 name Eric ?f2 friend ?f3, ?f3 friend ?f4, ?f4 lives ?l4} VRoR: CONSTRUCT{ ①?r2vrelated ?r4, ②?r4vlives ?l6} WHERE{ ?r2 name Eric, ?r2 related ?r3, ?r3 related ?r4, ?r4 lives ?l6}
Rewriting Algorithm Generate one of rewritten query Q’: SELECT ?f5 WHERE{ ?p name Eric, ?p friend ?f5, ?f5lives ?l’1, CandV4 CandV2 CandV3 CandV1 Qu: SELECT ?f5WHERE {?p vfriend?f5 ?f5vlives ?l5?p vrelated?r5?r5vlives ?l5.} [V2F,?f1?f5, ?l1 ?l5] [V1R,?r0?p, ?r1 ?r5] [V1F,?f0 ?p, ?f1 ?f5] [V2F,?f1?r5, ?l1 ?l5] [V1RoR,?r2?p, ?r4 ?r5] [V2FoF,?f4?f5, ?l4 ?l5] [V2FoF,?f4?r5, ?l4 ?l5] [V1FoF,?f2 ?p, ?f4 ?f5] [V2R,?r1?r5, ?l3 ?l5] [V2R,?r1?f5, ?l3 ?l5] [V2RoR,?r4?r5, ?l6 ?l5] [V2RoR,?r4?f5, ?l6 ?l5] VR: CONSTRUCT{ ①?r0vrelated ?r1, ②?r1vlives ?l3} WHERE{ ?r0 name Eric ?r0 related ?r1, ?r1 lives ?l3} VF: CONSTRUCT{ ①?f0vfriend ?f1, ②?f1vlives ?l1} WHERE{ ?f0 name Eric, ?f0 friend ?f1, ?f1 lives ?l1} VFoF: CONSTRUCT{ ①?f2vfriend ?f4, ②?f4vlives ?l4} WHERE{ ?f2 name Eric ?f2 friend ?f3, ?f3 friend ?f4, ?f4 lives ?l4} VRoR: CONSTRUCT{ ①?r2vrelated ?r4, ②?r4vlives ?l6} WHERE{ ?r2 name Eric, ?r2 related ?r3, ?r3 related ?r4, ?r4 lives ?l6} Undefined variable is renamed to a new variable
Rewriting Algorithm Generate one of rewritten query Q’: SELECT ?f5 WHERE{ ?f’2 name Eric, ?f’2 friend ?f’3, ?f’3 friend ?f5, ?f5live ?l5 ?p name Eric, ?p friend ?f5, ?f5lives ?l’1, CandV4 CandV2 CandV3 CandV1 Qu: SELECT ?f5WHERE {?p vfriend?f5 ?f5vlives ?l5?p vrelated?r5?r5vlives ?l5.} [V2F,?f1?f5, ?l1 ?l5] [V1R,?r0?p, ?r1 ?r5] [V1F,?f0 ?p, ?f1 ?f5] [V2F,?f1?r5, ?l1 ?l5] [V1RoR,?r2?p, ?r4 ?r5] [V2FoF,?f4?f5, ?l4 ?l5] [V2FoF,?f4?r5, ?l4 ?l5] [V1FoF,?f2 ?p, ?f4 ?f5] [V2R,?r1?r5, ?l3 ?l5] [V2R,?r1?f5, ?l3 ?l5] [V2RoR,?r4?r5, ?l6 ?l5] [V2RoR,?r4?f5, ?l6 ?l5] VR: CONSTRUCT{ ①?r0vrelated ?r1, ②?r1vlives ?l3} WHERE{ ?r0 name Eric ?r0 related ?r1, ?r1 lives ?l3} VF: CONSTRUCT{ ①?f0vfriend ?f1, ②?f1vlives ?l1} WHERE{ ?f0 name Eric, ?f0 friend ?f1, ?f1 lives ?l1} VFoF: CONSTRUCT{ ①?f2vfriend ?f4, ②?f4vlives ?l4} WHERE{ ?f2 name Eric ?f2 friend ?f3, ?f3 friend ?f4, ?f4 lives ?l4} VRoR: CONSTRUCT{ ①?r2vrelated ?r4, ②?r4vlives ?l6} WHERE{ ?r2 name Eric, ?r2 related ?r3, ?r3 related ?r4, ?r4 lives ?l6}
Rewriting Algorithm Generate one of rewritten query Q’: SELECT ?f5 WHERE{ ?f’2 name Eric, ?f’2 friend ?f’3, ?f’3 friend ?f5, ?f5live ?l5 ?p name Eric, ?p friend ?f5, ?f5lives ?l’1, ?p name Eric, ?p related ?r5, ?r5lives ?l’3, CandV4 CandV2 CandV3 CandV1 Qu: SELECT ?f5WHERE {?p vfriend?f5 ?f5vlives ?l5?p vrelated?r5?r5vlives ?l5.} [V2F,?f1?f5, ?l1 ?l5] [V1R,?r0?p, ?r1 ?r5] [V1F,?f0 ?p, ?f1 ?f5] [V2F,?f1?r5, ?l1 ?l5] [V1RoR,?r2?p, ?r4 ?r5] [V2FoF,?f4?f5, ?l4 ?l5] [V2FoF,?f4?r5, ?l4 ?l5] [V1FoF,?f2 ?p, ?f4 ?f5] [V2R,?r1?r5, ?l3 ?l5] [V2R,?r1?f5, ?l3 ?l5] [V2RoR,?r4?r5, ?l6 ?l5] [V2RoR,?r4?f5, ?l6 ?l5] VR: CONSTRUCT{ ①?r0vrelated ?r1, ②?r1vlives ?l3} WHERE{ ?r0 name Eric ?r0 related ?r1, ?r1 lives ?l3} VF: CONSTRUCT{ ①?f0vfriend ?f1, ②?f1vlives ?l1} WHERE{ ?f0 name Eric, ?f0 friend ?f1, ?f1 lives ?l1} VFoF: CONSTRUCT{ ①?f2vfriend ?f4, ②?f4vlives ?l4} WHERE{ ?f2 name Eric ?f2 friend ?f3, ?f3 friend ?f4, ?f4 lives ?l4} VRoR: CONSTRUCT{ ①?r2vrelated ?r4, ②?r4vlives ?l6} WHERE{ ?r2 name Eric, ?r2 related ?r3, ?r3 related ?r4, ?r4 lives ?l6}
Rewriting Algorithm Generate one of rewritten query Q’: SELECT ?f5 WHERE{ ?f’2 name Eric, ?f’2 friend ?f’3, ?f’3 friend ?f5, ?f5live ?l5 ?p name Eric, ?p friend ?f5, ?f5lives ?l’1, ?p name Eric, ?p related ?r5, ?r5lives ?l’3, ?r’2 name Eric, ?r’2 related ?r’3, ?r’3 related ?r5, ?r5lives ?l5} CandV4 CandV2 CandV3 CandV1 Qu: SELECT ?f5WHERE {?p vfriend?f5 ?f5vlives ?l5?p vrelated?r5?r5vlives ?l5.} [V2F,?f1?f5, ?l1 ?l5] [V1R,?r0?p, ?r1 ?r5] [V1F,?f0 ?p, ?f1 ?f5] [V2F,?f1?r5, ?l1 ?l5] [V1RoR,?r2?p, ?r4 ?r5] [V2FoF,?f4?f5, ?l4 ?l5] [V2FoF,?f4?r5, ?l4 ?l5] [V1FoF,?f2 ?p, ?f4 ?f5] [V2R,?r1?r5, ?l3 ?l5] [V2R,?r1?f5, ?l3 ?l5] [V2RoR,?r4?r5, ?l6 ?l5] [V2RoR,?r4?f5, ?l6 ?l5] VR: CONSTRUCT{ ①?r0vrelated ?r1, ②?r1vlives ?l3} WHERE{ ?r0 name Eric ?r0 related ?r1, ?r1 lives ?l3} VF: CONSTRUCT{ ①?f0vfriend ?f1, ②?f1vlives ?l1} WHERE{ ?f0 name Eric, ?f0 friend ?f1, ?f1 lives ?l1} VFoF: CONSTRUCT{ ①?f2vfriend ?f4, ②?f4vlives ?l4} WHERE{ ?f2 name Eric ?f2 friend ?f3, ?f3 friend ?f4, ?f4 lives ?l4} VRoR: CONSTRUCT{ ①?r2vrelated ?r4, ②?r4vlives ?l6} WHERE{ ?r2 name Eric, ?r2 related ?r3, ?r3 related ?r4, ?r4 lives ?l6}
Rewriting Algorithm Generate one of rewritten query Q’: SELECT ?f5 WHERE{ ?f’2 name Eric, ?f’2 friend ?f’3, ?f’3 friend ?f5, ?f5live ?l5 ?p name Eric, ?p friend ?f5, ?f5lives ?l’1, ?p name Eric, ?p related ?r5, ?r5lives ?l’3, ?r’2 name Eric, ?r’2 related ?r’3, ?r’3 related ?r5, ?r5lives ?l5} CandV4 CandV4 CandV2 CandV2 CandV3 CandV3 CandV1 CandV1 Qu: SELECT ?f5WHERE {?p vfriend?f5 ?f5vlives ?l5?p vrelated?r5?r5vlives ?l5.} [V2F,?f1?f5, ?l1 ?l5] [V1R,?r0?p, ?r1 ?r5] [V1F,?f0 ?p, ?f1 ?f5] [V2F,?f1?r5, ?l1 ?l5] [V1RoR,?r2?p, ?r4 ?r5] [V2FoF,?f4?f5, ?l4 ?l5] [V2FoF,?f4?r5, ?l4 ?l5] [V1FoF,?f2 ?p, ?f4 ?f5] [V2R,?r1?r5, ?l3 ?l5] [V2R,?r1?f5, ?l3 ?l5] [V2RoR,?r4?r5, ?l6 ?l5] [V2RoR,?r4?f5, ?l6 ?l5] VFoF: CONSTRUCT{ ①?f2vfriend ?f4, ②?f4vlives ?l4} WHERE{ ?f2 name Eric ?f2 friend ?f3, ?f3 friend ?f4, ?f4 lives ?l4} VRoR: CONSTRUCT{ ①?r2vrelated ?r4, ②?r4vlives ?l6} WHERE{ ?r2 name Eric, ?r2 related ?r3, ?r3 related ?r4, ?r4 lives ?l6} VF: CONSTRUCT{ ①?f0vfriend ?f1, ②?f1vlives ?l1} WHERE{ ?f0 name Eric, ?f0 friend ?f1, ?f1 lives ?l1} VR: CONSTRUCT{ ①?r0vrelated ?r1, ②?r1vlives ?l3} WHERE{ ?r0 name Eric ?r0 related ?r1, ?r1 lives ?l3} There are totally |CandV1| ×|CandV2| ×|CandV3| ×|CandV4| rewritings
Rewriting Algorithm Theorem 1. The rewriting Q’ of SQR is sound and complete(see proof in Appendix A) • There is no redundancy nor misses resulted from the rewriting • But can this be better?
Optimizing Individual Rewriting Generate one of rewritten query Q’: SELECT ?f5 WHERE{?p name Eric, ?p friend ?f5, ?f5lives ?l’1, ?r’0 name Eric, ?f’0 friend ?f5, ?f5lives ?l5, ?p name Eric, ?p related ?r5, ?r5 lives ?l’3, ?r’2 name Eric, ?r’2 related ?r’3, ?r’3 related ?r5, ?r5lives ?l5} ?p name Eric, ?p friend ?f5, ?f5lives ?l5 CandV4 CandV2 CandV3 CandV1 Qu: SELECT ?f5WHERE {?p vfriend?f5 ?f5vlives ?l5?p vrelated?r5?r5vlives ?l5.} [V1R,?r0?p, ?r1 ?r5] [V1F,?f0 ?p, ?f1 ?f5] [V2F,?f1?r5, ?l1 ?l5] [V2F,?f1?f5, ?l1 ?l5] [V1RoR,?r2?p, ?r4 ?r5] [V2FoF,?f4?f5, ?l4 ?l5] [V2FoF,?f4?r5, ?l4 ?l5] [V1FoF,?f2 ?p, ?f4 ?f5] [V2R,?r1?r5, ?l3 ?l5] [V2R,?r1?f5, ?l3 ?l5] Φmerge= [?f0 ?p, ?f1 ?f5, ?l1 ?l5] [V2RoR,?r4?r5, ?l6 ?l5] [V2RoR,?r4?f5, ?l6 ?l5] VF: CONSTRUCT{ ①?f0vfriend ?f1, ②?f1vlives ?l1} WHERE{ ?f0 name Eric, ?f0 friend ?f1, ?f1 lives ?l1} VFoF: CONSTRUCT{ ①?f2vfriend ?f4, ②?f4vlives ?l4} WHERE{ ?f2 name Eric ?f2 friend ?f3, ?f3 friend ?f4, ?f4 lives ?l4} VRoR: CONSTRUCT{ ①?r2vrelated ?r4, ②?r4vlives ?l6} WHERE{ ?r2 name Eric, ?r2 related ?r3, ?r3 related ?r4, ?r4 lives ?l6} VR: CONSTRUCT{ ①?r0vrelated ?r1, ②?r1vlives ?l3} WHERE{ ?r0 name Eric ?r0 related ?r1, ?r1 lives ?l3}
Optimizing Individual Rewriting Theorem 2. Query q’mergeresulting from (i) replacing the two copies of view V in query q’ with one: and (ii) applying Φmerge computed by Algorithm 2, in place of Φ1 and Φ2 : is equivalent to q’ (see proof in Appendix B)
Prunning Rewritings with Empty Results ?f5is Eric’s friend-of-friend who is also his relative Generate one of written queryQ’ SELECT ?f5 WHERE{?p name Eric, ?p friend ?f’3, ?f’3 friend ?f5, ?f5lives ?l’4, ?r’0 name Eric, ?r’0 related ?f5, ?f5lives ?l5..... CandV4 CandV4 CandV2 CandV2 CandV3 CandV3 CandV1 CandV1 Qu: SELECT ?f5WHERE {?p vfriend?f5 ?f5vlives ?l5?p vrelated?r5?r5vlives ?l5.} [V2F,?f1?f5, ?l1 ?l5] [V1R,?r0?p, ?r1 ?r5] [V1F,?f0 ?p, ?f1 ?f5] [V2F,?f1?r5, ?l1 ?l5] [V1RoR,?r2?p, ?r4 ?r5] [V2FoF,?f4?f5, ?l4 ?l5] [V2FoF,?f4?r5, ?l4 ?l5] [V1FoF,?f2 ?p, ?f4 ?f5] [V2R,?r1?r5, ?l3 ?l5] [V2R,?r1?f5, ?l3 ?l5] [V2RoR,?r4?r5, ?l6 ?l5] [V2RoR,?r4?f5, ?l6 ?l5] VFoF: CONSTRUCT{ ①?f2vfriend ?f4, ②?f4vlives ?l4} WHERE{ ?f2 name Eric ?f2 friend ?f3, ?f3 friend ?f4, ?f4 lives ?l4} VR: CONSTRUCT{ ①?r0vrelated ?r1, ②?r1vlives ?l3} WHERE{ ?r0 name Eric ?r0 related ?r1, ?r1 lives ?l3} VF: CONSTRUCT{ ①?f0vfriend ?f1, ②?f1vlives ?l1} WHERE{ ?f0 name Eric, ?f0 friend ?f1, ?f1 lives ?l1} VRoR: CONSTRUCT{ ①?r2vrelated ?r4, ②?r4vlives ?l6} WHERE{ ?r2 name Eric, ?r2 related ?r3, ?r3 related ?r4, ?r4 lives ?l6}
Prunning Rewritings with Empty Results It turns out that Eric’s FoF and R are disjoint
Prunning Rewritings with Empty Results ?f5is Eric’s friend-of-friend who is also his relative →Nobody Generate one of written queryQ’ SELECT ?f5 WHERE{?p name Eric, ?p friend ?f’3, ?f’3 friend ?f5, ?f5lives ?l’4, ?r’0 name Eric, ?r’0 related ?f5, ?f5lives ?l5..... CandV4 CandV4 CandV2 CandV2 CandV3 CandV3 CandV1 CandV1 Qu: SELECT ?f5WHERE {?p vfriend?f5 ?f5vlives ?l5?p vrelated?r5?r5vlives ?l5.} [V2F,?f1?f5, ?l1 ?l5] [V1R,?r0?p, ?r1 ?r5] [V1F,?f0 ?p, ?f1 ?f5] [V2F,?f1?r5, ?l1 ?l5] [V1RoR,?r2?p, ?r4 ?r5] [V2FoF,?f4?f5, ?l4 ?l5] [V2FoF,?f4?r5, ?l4 ?l5] [V1FoF,?f2 ?p, ?f4 ?f5] [V2R,?r1?r5, ?l3 ?l5] [V2R,?r1?f5, ?l3 ?l5] [V2RoR,?r4?r5, ?l6 ?l5] [V2RoR,?r4?f5, ?l6 ?l5] VFoF: CONSTRUCT{ ①?f2vfriend ?f4, ②?f4vlives ?l4} WHERE{ ?f2 name Eric ?f2 friend ?f3, ?f3 friend ?f4, ?f4 lives ?l4} VR: CONSTRUCT{ ①?r0vrelated ?r1, ②?r1vlives ?l3} WHERE{ ?r0 name Eric ?r0 related ?r1, ?r1 lives ?l3} VF: CONSTRUCT{ ①?f0vfriend ?f1, ②?f1vlives ?l1} WHERE{ ?f0 name Eric, ?f0 friend ?f1, ?f1 lives ?l1} VRoR: CONSTRUCT{ ①?r2vrelated ?r4, ②?r4vlives ?l6} WHERE{ ?r2 name Eric, ?r2 related ?r3, ?r3 related ?r4, ?r4 lives ?l6}
Prunning Rewritings with Empty ResultsHow to detect empty rewritings • Consider a pair of triple (?y1, p1, ?y2) and (?y3, p2, ?y4) where the join equates ?y2 and ?y3 • A(?x): value set of a variable ?x • If they store A(?x) for every variable in any triple pattern, the problem is trivial • A(?y2)∧ A(?y3) = ∅→empty • A(?y2)∧ A(?y3) ≠∅→joinable expensive space-wise Authors designed a space-efficient heuristic that works well in practice
Prunning Rewritings with Empty ResultsHow to detect empty rewritings • Authors can estimate the size of intersection of A(?y2) and A(?y3) • α(A(?y2)∧A(?y3))> τ(τ:preset threshold) • They consider the probability that the join is not empty is high • continue rewriting with the triples (?y1, p1, ?y2) and (?y3, p2, ?y4) • α(A(?y2)∧A(?y3)) < τ • They consider the probability that the result is empty is high • ask query to verify if the join is actually empty • yes → remove rewritings involving triples (?y1, p1, ?y2) and (?y3, p2, ?y4) • no → continue rewriting with the triples (?y1, p1, ?y2) and (?y3, p2, ?y4) • α(A(?y2)∧A(?y3))
?f5is Eric’s friend-of-friend who is also his relative →Nobody Generate one of written queryQ’ SELECT ?f5 WHERE{?p name Eric, ?p friend ?f’3, ?f’3 friend ?f5, ?f5lives ?l’4, ?r’0 name Eric, ?r’0 related ?f5, ?f5lives ?l5..... CandV4 CandV4 CandV2 CandV2 CandV3 CandV3 CandV1 CandV1 Qu: SELECT ?f5WHERE {?p vfriend?f5 ?f5vlives ?l5?p vrelated?r5?r5vlives ?l5.} [V2F,?f1?f5, ?l1 ?l5] [V1R,?r0?p, ?r1 ?r5] [V1F,?f0 ?p, ?f1 ?f5] [V2F,?f1?r5, ?l1 ?l5] [V1RoR,?r2?p, ?r4 ?r5] [V2FoF,?f4?f5, ?l4 ?l5] [V2FoF,?f4?r5, ?l4 ?l5] [V1FoF,?f2 ?p, ?f4 ?f5] [V2R,?r1?r5, ?l3 ?l5] [V2R,?r1?f5, ?l3 ?l5] [V2RoR,?r4?r5, ?l6 ?l5] [V2RoR,?r4?f5, ?l6 ?l5] VFoF: CONSTRUCT{ ①?f2vfriend ?f4, ②?f4vlives ?l4} WHERE{ ?f2 name Eric ?f2 friend ?f3, ?f3 friend ?f4, ?f4 lives ?l4} VR: CONSTRUCT{ ①?r0vrelated ?r1, ②?r1vlives ?l3} WHERE{ ?r0 name Eric ?r0 related ?r1, ?r1 lives ?l3} VF: CONSTRUCT{ ①?f0vfriend ?f1, ②?f1vlives ?l1} WHERE{ ?f0 name Eric, ?f0 friend ?f1, ?f1 lives ?l1} VRoR: CONSTRUCT{ ①?r2vrelated ?r4, ②?r4vlives ?l6} WHERE{ ?r2 name Eric, ?r2 related ?r3, ?r3 related ?r4, ?r4 lives ?l6}
Experiments • Implementation: They implemented the rewriting algorithms and optimizations in C++ • Views and Queries: They defined queries and views on LUBM benchmark(about students, departments, professors, etc) • Data: 10M triples from LUBM • Platform: 64-bit lunux with 2GHz Intel Xenon CPU and 4GB Mem • Store: Evaluations were performed on MySQL(for relational exp), 4store and Jena TDB(RDF store)
Experiments: Notation • SQR: SPARQL query rewriting without optimizations • OSQR: SPARQL query rewriting with optimizations • SQL: Translation to SQL and rewriting in relational DB
Experimental Results with 4StoreNative SPARQL rewriting vs SQL expansion • OSQR result • between one and four orders of magnitude less queries • two orders of magnitude faster than both SQR and SQL
Native SPARQL rewriting vs SQL expansion • OSQR result • an order of magnitude less rewriting than SQR and SQL • an order of magnitude saving in evaluation time
Optimizing Individual Rewritings • OSQR-M: no optimizations but merging views • num of rewritings is same as SQR • save 10%-70% evaluation time
Pruning Rewritings with Empty Results • OSQR-P: no optimization but pruning empty rewritings • produces an order of magnitude less rewritings • result in an order of magnitude faster evaluation times for query Q
Optimizing the Generation of Rewritings • OSQR-R: OSQR without merging views • result in an order of magnitude less ASK queries • result in near 60%savings in evaluation times
Experimental Results from Jena TDB • Their algorithm is flexible and store-independent
Related Work • [22] S. Rizvi, A. Mendelzon, S. Sudarshan, and P. Roy. Extending query rewriting techniques for fine-grained access control. In SIGMOD, 2004. • rewrite queries of views for relational DBs • [12] G. Correndo, M. Salvadores, I. Millard, H. Glaser, and N. Shadbolt. SPARQL query rewriting for implementing data integration over linked data. In EDBT, 2010. • perform rewritings using pre-defined rewriting rules • [5]F. Abel and et al. Enabling advanced and context dependent access control in RDF stores. In ISWC, 2007. • [20] G. Manjunath and et al. Semantic views for controlled access to the semantic web. In Tech. Rep. HPL-2008-15, 2008. • general query rewriting in RDF store • view definition is customized with high-level languages • query rewriting is performed in an ad-hoc manner
Conclusion • Authors studied rewriting queries over views in the context of SPARQL and RDF data. • They proposed the first sound and complete rewriting algorithm. • Novel optimizations to simplify individual rewritings and prune rewritings with empty results. • Their solution is independent of RDF stores and hence portable.