Answering Queries Using Views

Answering Queries Using Views Presented by: Mahmoud ELIAS

Plan • Introduction • Motivations • Views • Problem definition • When is a view usable for a query? • Answering queries using views for Data Integration • Conclusions • Bibliography

Introduction • Relevance to a wide variety of data management problems • Example: University schema: • Prof(name, area) • Course(c-number, title) • Teaches(prof, c-number, quarter, evaluation) • Registered(student, c-number, quarter) • Major(student, dept) • WorksIn(prof, dept) • Advises(prof, student)

Motivations • Query optimization: • Speed up the computation of the query. • Be attention of indexes. • Maintaining physical Data independenceVs • Data Integration: • uniform query interface to a multitude of autonomous heterogeneous data sources

Motivation (cont.) • Data warehouse design: • we must be able to answer all the required queries over the warehouse using only these views • Semantic data caching • Check whether the cached results of a previously computed query can be used for a new query

Building views System analysis The user will never be satisfied feedback statistics Choosing and creating views

Problem definition • A view is a derived relation defined in terms of stored base relations. • A query Q1 is said to be containedin a query Q2 (Q1 Q2), if for all databases D, the set of tuples computed for Q1 is a subset of those computed for Q2 • Two queries are said to be equivalentif Q1 Q2and Q2 Q1 • Given a query Q and a set of view definitions V1,V2,…,Vm, arewritingof the query using the views is a query expression Q’ that refers only to the given views. • Equivalent rewriting vs. maximally-contained rewriting

When is a View usable for a Query? • A view can be useful for a query if the set of relations it mentions overlaps with that of the query, and it selects some of the attributes selected by the query. • When the view contains grouping and aggregation but the query does not, then unless the query removes duplicates in the select clause, the view cannot be used to answer a query.

Example1: • select Advises.prof, Advises.student, Registered.quarter from Registered, Teaches, Advises where Registered.c-number=Teaches.c-number and Registered.quarter=Teaches. quarter and Advises.prof=Teaches.prof and Advises.student= Registered.student and Registered.quarter ≥ « winter98 » • create view V1 as select Registered.student, Teaches.prof, Registered.quarter from Registered, Teaches where Registered.c-number=Teaches.c-number and Registered.quarter=Teaches. quarter and Registered.quarter ≥ « winter97 »

Example2: • create view V2 as select c-number, year, max(evaluation) as maxeval, count(*) asofferings from Teaches where c-number ≥ 400 groupBy c-number, year • select year, count(*), max(evaluation) from Teaches where c-number ≥ 500 groupBy year • select year, sum(offerings), Max(maxeval) from V2 where c-number ≥ 500 groupBy year

Conjunctive queries head body • Safe - each variable in the head appears in the body. • No arithmetic comparisons in predicates. subgoal

The Bucket algorithm • Nb of query rewritings that need to be considered can be drastically reduced if we first consider each subgoal in the query in isolation, and determine which views may be relevant to each subgoal. • 2 steps: • Create a bucket for each subgoal in Q • Queryrewritings that are conjunctive queries, each consisting of one conjunct from every bucket.

Example V1(st,c-n,qu,ti) :- Registered(st,c-n,qu), Course(c-n,ti), c-n ≥ 500, qu ≥ Aur98 V2(st,pr,c-n,qu) :- Registered(st,c-n,qu), Teaches(pr,c-n,qu) V3(st,c-n) :- Registered(st,c-n,qu), year ≤ Aut94 V4(pr,c-n,ti,qu) :- Registered(st,c-n,qu), Course(c-n,ti), Teaches(pr,c-n,qu), qu ≤ Aut97 Q(S,C,P) :- Teaches(P,C,Q), Registered(S,C,Q), Course(C,T), C ≥300, Q ≥ Aut95

Example (cont.) All combinations: q’(S,C,P) :- V2(S’,P,C,Q), V1(S,C,Q,T’) , V1(S’,C,Q’,T)  q’(S,C,P) :- V2(S’,P,C,Q), V1(S,C,Q,T') q’(S,C,P) :- V4(P,C,T’,Q), V1(S,C,Q,T’), V4(P’,C,T,Q’) q’(S,C,P) :- V2(S,P,C,Q), V4(P,C,T’,Q) The algorithm produces a maximally-contained rewriting

The Inverse-rules algorithm • Construct a set of rules that invert the view definitions V3(dept, c-number) :- Major(student, dept), Registered(student,c-number) Major(f1(dept,X), dept) :- V3(dept,X) Registered(f1(Y,c-number),c-number) :- V3(Y,c-number) Q(dept) :- Major(student,dept), Registered(student,444) V3 = {(CS, 444), (EE, 444), (CS,333)}  Registered = {(f1(CS,444), CS), (f1(EE,444), EE), (f1(CS,333), CS)} Major = {(f1(CS,444), 444), (f1(EE,444), 444), (f1(CS,333), 333)}  Q = {CS, EE} • more efficient rewriting  unfold the inverse rules and remove redundant subgoals from the unfolded rules.

Bucket vs. Inverse-rules • Both algorithms produce a maximally-contained rewriting • Computing buckets is similar in spirit to that of computing the inverse rules: compute the views that are relevant to single atoms of the DB relations. • The Bucket algo. Computes the relevant views by taking into consideration the context in which th atom appears in the query. • The inverse rules can be computed once, and be applicable to any query.

The MiniCon algorithm • It addresses the limitations of previous algorithms • instead of building rewritings by combining rewritings for each of the query subgoal or the DB relation, we consider how each of the variables in the query can interact with the available views (MiniCon Description MCD) Q(D) :- Major(S,D), Registered(S,444,Q), Advises(P,S) V1(dept) :- Major(student,dept), Registered(student,444,quarter) V2(prof,dept,area) :- Advises(prof,student), Prof(name,area) V3(dept,c-number) :- Major(student,dept), Registered(student,c- number,quarter), Advises(prof,student)

Results • The key advantage of the MiniCon algorithm is that the second phase of the algorithm considers much fewer combinations of MCDs compared to the Cartesian product of the buckets or compared to the number of unfoldings of inverse rules.

Conclusions • Using views to answer queries is an important problem. Especially for information integration on the web. • Query containment and containment mappings provide the key for solving the problem.

Conclusions (cont.) • The variants of the problem are NP-complete. This is not too bad, since queries are usually short. • In many practical cases, there is an algorithm for solving the problem.

Bibliography [Lev00]Alon Y. Levy. Answering Queries Using Views: A Survey. Department of Computer Science and Engineering, University ofWashington, pages 1-43, 2000 [Mit99]Prasenjit Mitra. An Algorithm for Answering Queries Efficiently Using Views. Infolab, Stanford University, pages 1-13, September,1999. [DG97b] Oliver M. Duschka and Michael R. Genesereth. Query Planning inInfoMaster. In Proceedings of the ACM Symposium on Applied Computing, San Jose, CA, 1997.

Answering Queries Using Views

Answering Queries Using Views

Presentation Transcript

Answering Queries Using Views: The Last Frontier

Answering Queries Using Views LMSS 95

Answering Queries Using Views: A Survey

Answering Top-k Queries Using Views

Answering Queries Using Views: A Survey

Answering Top-k Queries Using Views

Query Answering using Views

Optimizing Queries Using Materialized Views

Answering queries across mappings

Answering Queries: Problems

Answering Approximate Queries Efficiently

Answering Queries using views: A survey

Answering Top-k Queries Using Views

Answering Tree Pattern Queries Using Views

Answering Queries Using Views

A Scalable Algorithm for Answering Queries Using Views

Answering Queries using Templates with Binding Patterns

Generating Efficient Plans for Queries Using Views

Answering Approximate Queries Efficiently

A Scalable Algorithm for Answering Queries Using Views