210 likes | 389 Views
Answering Queries Using Views. Presented by: Mahmoud ELIAS. Plan. Introduction Motivations Views Problem definition When is a view usable for a query ? Answering queries using views for Data Integration Conclusions Bibliography. Introduction.
E N D
Answering Queries Using Views Presented by: Mahmoud ELIAS
Plan • Introduction • Motivations • Views • Problem definition • When is a view usable for a query? • Answering queries using views for Data Integration • Conclusions • Bibliography
Introduction • Relevance to a wide variety of data management problems • Example: University schema: • Prof(name, area) • Course(c-number, title) • Teaches(prof, c-number, quarter, evaluation) • Registered(student, c-number, quarter) • Major(student, dept) • WorksIn(prof, dept) • Advises(prof, student)
Motivations • Query optimization: • Speed up the computation of the query. • Be attention of indexes. • Maintaining physical Data independenceVs • Data Integration: • uniform query interface to a multitude of autonomous heterogeneous data sources
Motivation (cont.) • Data warehouse design: • we must be able to answer all the required queries over the warehouse using only these views • Semantic data caching • Check whether the cached results of a previously computed query can be used for a new query
Building views System analysis The user will never be satisfied feedback statistics Choosing and creating views
Problem definition • A view is a derived relation defined in terms of stored base relations. • A query Q1 is said to be containedin a query Q2 (Q1 Q2), if for all databases D, the set of tuples computed for Q1 is a subset of those computed for Q2 • Two queries are said to be equivalentif Q1 Q2and Q2 Q1 • Given a query Q and a set of view definitions V1,V2,…,Vm, arewritingof the query using the views is a query expression Q’ that refers only to the given views. • Equivalent rewriting vs. maximally-contained rewriting
When is a View usable for a Query? • A view can be useful for a query if the set of relations it mentions overlaps with that of the query, and it selects some of the attributes selected by the query. • When the view contains grouping and aggregation but the query does not, then unless the query removes duplicates in the select clause, the view cannot be used to answer a query.
Example1: • select Advises.prof, Advises.student, Registered.quarter from Registered, Teaches, Advises where Registered.c-number=Teaches.c-number and Registered.quarter=Teaches. quarter and Advises.prof=Teaches.prof and Advises.student= Registered.student and Registered.quarter ≥ « winter98 » • create view V1 as select Registered.student, Teaches.prof, Registered.quarter from Registered, Teaches where Registered.c-number=Teaches.c-number and Registered.quarter=Teaches. quarter and Registered.quarter ≥ « winter97 »
Example2: • create view V2 as select c-number, year, max(evaluation) as maxeval, count(*) asofferings from Teaches where c-number ≥ 400 groupBy c-number, year • select year, count(*), max(evaluation) from Teaches where c-number ≥ 500 groupBy year • select year, sum(offerings), Max(maxeval) from V2 where c-number ≥ 500 groupBy year
Conjunctive queries head body • Safe - each variable in the head appears in the body. • No arithmetic comparisons in predicates. subgoal
The Bucket algorithm • Nb of query rewritings that need to be considered can be drastically reduced if we first consider each subgoal in the query in isolation, and determine which views may be relevant to each subgoal. • 2 steps: • Create a bucket for each subgoal in Q • Queryrewritings that are conjunctive queries, each consisting of one conjunct from every bucket.
Example V1(st,c-n,qu,ti) :- Registered(st,c-n,qu), Course(c-n,ti), c-n ≥ 500, qu ≥ Aur98 V2(st,pr,c-n,qu) :- Registered(st,c-n,qu), Teaches(pr,c-n,qu) V3(st,c-n) :- Registered(st,c-n,qu), year ≤ Aut94 V4(pr,c-n,ti,qu) :- Registered(st,c-n,qu), Course(c-n,ti), Teaches(pr,c-n,qu), qu ≤ Aut97 Q(S,C,P) :- Teaches(P,C,Q), Registered(S,C,Q), Course(C,T), C ≥300, Q ≥ Aut95
Example (cont.) All combinations: q’(S,C,P) :- V2(S’,P,C,Q), V1(S,C,Q,T’) , V1(S’,C,Q’,T) q’(S,C,P) :- V2(S’,P,C,Q), V1(S,C,Q,T') q’(S,C,P) :- V4(P,C,T’,Q), V1(S,C,Q,T’), V4(P’,C,T,Q’) q’(S,C,P) :- V2(S,P,C,Q), V4(P,C,T’,Q) The algorithm produces a maximally-contained rewriting
The Inverse-rules algorithm • Construct a set of rules that invert the view definitions V3(dept, c-number) :- Major(student, dept), Registered(student,c-number) Major(f1(dept,X), dept) :- V3(dept,X) Registered(f1(Y,c-number),c-number) :- V3(Y,c-number) Q(dept) :- Major(student,dept), Registered(student,444) V3 = {(CS, 444), (EE, 444), (CS,333)} Registered = {(f1(CS,444), CS), (f1(EE,444), EE), (f1(CS,333), CS)} Major = {(f1(CS,444), 444), (f1(EE,444), 444), (f1(CS,333), 333)} Q = {CS, EE} • more efficient rewriting unfold the inverse rules and remove redundant subgoals from the unfolded rules.
Bucket vs. Inverse-rules • Both algorithms produce a maximally-contained rewriting • Computing buckets is similar in spirit to that of computing the inverse rules: compute the views that are relevant to single atoms of the DB relations. • The Bucket algo. Computes the relevant views by taking into consideration the context in which th atom appears in the query. • The inverse rules can be computed once, and be applicable to any query.
The MiniCon algorithm • It addresses the limitations of previous algorithms • instead of building rewritings by combining rewritings for each of the query subgoal or the DB relation, we consider how each of the variables in the query can interact with the available views (MiniCon Description MCD) Q(D) :- Major(S,D), Registered(S,444,Q), Advises(P,S) V1(dept) :- Major(student,dept), Registered(student,444,quarter) V2(prof,dept,area) :- Advises(prof,student), Prof(name,area) V3(dept,c-number) :- Major(student,dept), Registered(student,c- number,quarter), Advises(prof,student)
Results • The key advantage of the MiniCon algorithm is that the second phase of the algorithm considers much fewer combinations of MCDs compared to the Cartesian product of the buckets or compared to the number of unfoldings of inverse rules.
Conclusions • Using views to answer queries is an important problem. Especially for information integration on the web. • Query containment and containment mappings provide the key for solving the problem.
Conclusions (cont.) • The variants of the problem are NP-complete. This is not too bad, since queries are usually short. • In many practical cases, there is an algorithm for solving the problem.
Bibliography [Lev00]Alon Y. Levy. Answering Queries Using Views: A Survey. Department of Computer Science and Engineering, University ofWashington, pages 1-43, 2000 [Mit99]Prasenjit Mitra. An Algorithm for Answering Queries Efficiently Using Views. Infolab, Stanford University, pages 1-13, September,1999. [DG97b] Oliver M. Duschka and Michael R. Genesereth. Query Planning inInfoMaster. In Proceedings of the ACM Symposium on Applied Computing, San Jose, CA, 1997.