280 likes | 306 Views
A Scalable Algorithm for Answering Queries Using Views. Rachel Pottinger Qualifying Exam October 29, 1999 Advisor: Alon Levy. Answering Queries Using Views. Problem: access views instead of original relations Useful in data integration and query optimization NP-Complete
E N D
A Scalable Algorithm for Answering Queries Using Views Rachel Pottinger Qualifying Exam October 29, 1999 Advisor: Alon Levy
Answering Queries Using Views • Problem: access views instead of original relations • Useful in data integration and query optimization • NP-Complete • Many papers on the subject • No empirical testing of algorithms
Data Integration:Query Reformulation • Data sources are pre-calculated views • Views are not complete • Get the most answers possible given the views • Many data sources Car sale information Ford cars - dealer prices - sticker prices - inventory Cheap cars - prices -manufacturer Used cars - prices - dealer - year
Data Integration Example Query: find the prices of cars that we can buy at cost Q(cost):-dealercost(car,cost) & stickerprice(car,cost) V1(price1,price2):-dealercost(car, price1) & stickerprice(car, price2) & maker(car, “Ford”) V2(cost):-dealercost(car, cost) & stickerprice(car,cost) & cheap(car) Q’1(cost):-Ford(cost, cost) Q’2(cost):-BMW(cost) Database relations Query Views distinguished existential Maximally contained rewriting Conjunctive rewritings
Outline • Previous algorithms • Bucket Algorithm [Levy, Rajaraman, Ordille, 1996] • Inverse rules [Duschka, Genesereth, 1997] • Minimum Necessary Connections (MiniCon) Algorithm • Experimental evaluation • Extension to arithmetic comparisons • Conclusions and future work
The Bucket Algorithm • Introduced as part of Information Manifold • Treats subgoals individually
r2(y,x) r1(x,y) V2(x), V3(x) V1(x),V3(x) Bucket Algorithm: Populating buckets • For each subgoal in the query, place relevant views in the subgoal’s bucket Inputs: Q(x):- r1(x,y) & r2(y,x) V1(a):-r1(a,b) V2(d):-r2(c,d) V3(f):- r1(f,g) & r2(g,f) Buckets:
For every combination in the Cartesian products from the buckets, check containment in the query Candidate rewritings: Q’1(x) :- V1(x) & V2(x) Q’2(x) :- V1(x) & V3(x) Q’3(x) :- V3(x) & V2(x) Q’4(x) :- V3(x) & V3(x) r2(y,x) r1(x,y) V2(x), V3(x) V1(x),V3(x) Combining Buckets Bucket Algorithm will check all possible combinations Buckets: r1(x,y) r2(y,x)
Inverse Rules Part of the Info Master system Inverse rules show how to get database tuples from the views Cannot be extended to interpreted predicates Stops earlier than the Bucket Algorithm
Inputs: V1(a):-r1(a,b) V2(d):-r2(c,d) V3(f):- r1(f,g) & r2(g,f) Inverse Rules: IR1 r1(a, sfV1(a)) :-V1(a) IR2 r2(sfV2(d),d) :-V2(d) IR3 r1(f,sfV3(f)) :-V3(f) IR4 r2(sfV3(f),f) :-V3(f) Creating Inverse Rules • For each V(X):-r1(X1) &… & rn(Xn) • for each j = 1, …, n form an inverse rule: rj(Xj):-V(X) Skolem Function
Inverse Rules + IR1 r1(a, sfV1(a)) :-V1(a) IR2 r2(sfV2(d),d) :-V2(d) IR3 r1(f,sfV3(f)) :-V3(f) IR4 r2(sfV3(f),f) :-V3(f) Tuples V1(g) V2(h) V3(j) V3(m) Combining Inverse Rules At query time, query over rules Query + Q(x):-r1(x,y)& r2(y,x) • = Expansion: • r1(g,sfV1(g)), r2(sfV2(h),h), r1(j,sfV3(j)), r2(sfV3(j),j) • r1(m,sfV3(m)), r2(sfV3(m),m)
Unfolding rules before tuples Q(x):- r1(x,y) & r2(y,x) IR1 IR3 IR2 IR4 Use unification to see if rewriting is contained in the query No containment check necessary
The MiniCon Algorithm Concentrate on variables rather than subgoals to create MiniCon Descriptions (MCDs) Combine MCDs that only overlap on distinguished view variables No containment check!
view mapping subgoals mapped V3 x f, y g 1, 2 MiniCon Description Formation • Form all MiniCon Descriptions (MCDs) that map all query variables that have to be mapped together Inputs: Q(x) :-r1(x,y) & r2(y,x) V1(a):-r1(a,b) V2(d):-r2(c,d) V3(f):- r1(f,g) & r2(g,f) MCDs:
view mapping subgoals mapped V3 x f, y g 1, 2 MiniCon Combination Take all combinations of MCDs that • map disjoint sets of subgoals • map all subgoals of the query MCDs: Rewriting: Q’(x):-V3(x)
Experimental Evaluation Tested performance and scale up of: • Bucket Algorithm • Inverse Rules extended with unification • MiniCon Algorithm MiniCon at least as good in all cases, much better in some Show results for chain queries: Q(a):-r1(a,b), r2(b,c), r3(c,d), r4(d,e)
Extension:Interpreted Predicates Problem is in general undecidable We looked at subgoals of the form: var < constant or var > constant If maps to an existential view variable, require interpreted predicates implied Ex: Q(x):-r1(x,y), y > 17 V1(a):-r1(a,b), b > 18 Guaranteed to be sound Interpreted Predicates
Future Work • Query Optimization Look for the fastest answer to query Assume that all views are complete Require equivalent rewritings Need to allow overlap on subgoals mapped • A fuller comparison of interpreted predicates
Conclusions • Scalability of previous algorithms understood • MiniCon Algorithm invented • First experimental comparison of algorithms for answering queries using views • Extensions to binding patterns, interpreted predicates • New maximally contained rewriting form
Maximally contained Rewritings • Q’ is a maximally contained rewriting of a query Q using the views V = V1, …, Vn if • For any database D, and extensions v1, …, vn of the views such that vi Vi(D), 1 i n, then Q’(v1, …, v2) Q(D) for all i • There is no other query Q1 such that • Q’(v1, …, vn) Q1(v1, …, vn) • (2) Q1(v1, …, vn) Q(D), and there exists at least one database for which is a strict subset
Containment Checks • Q1 Q2 if the answer to Q1 is a subset of Q2 • m is a containment mapping from Vars(Q2) to Vars(Q1) if • m maps every subgoal in the body of Q2 to a subgoal in the body of Q1 • m maps the head of Q2 to the head of Q1
Inverse Rules With Unification • Find all Inverse Rules that match each query subgoal; place in bucket for that subgoal • For each rule in the first bucket • For each other subgoal, i, attempt to unify the rules so far with all elements in the bucket for I • If we cannot unify with anything in that bucket, break out of loop, otherwise, recurse
Correctness requirements • We need both soundness and completeness • A sound rewriting has a valid containment mapping from the variables of the query to the variables of the view • For completeness we need only to check rewritings of length less than or equal to that of the query
Extensions to XML • Need to choose a query language • Containment checks should still hold • Need to check to make sure that restructured elements are distinguished • May even be more scalable vs Inverse Rules, Bucket Algorithm