340 likes | 467 Views
University of Ioannina Dept. of Computer Science. View Usability and Safety for the Answering of Top- k Queries via Materialized Views. Eftychia Baikousi Panos Vassiliadis. Forecast. Problem of answering a top- k query through materialized top- n views
E N D
University of Ioannina Dept. of Computer Science View Usability and Safety for the Answering of Top-k Queries via Materialized Views Eftychia Baikousi Panos Vassiliadis
Forecast • Problem of answering a top-k query through materialized top-n views • Theoretical guarantees when a top-n materialized view can answer a top-k query • Algorithmic techniquesfor answering a top-k query from a materialized view • Properties of the safe areas of views DOLAP 2009, Hong Kong, 6 Nov 2009
Contents • Motivation & Problem Definition • Overview of the Method • Theoretical guarantees • Strictness of theorem • Safe area properties • Experiments • Conclusions • Future extensions DOLAP 2009, Hong Kong, 6 Nov 2009
Contents • Motivation & Problem Definition • Overview of the Method • Theoretical guarantees • Strictness of theorem • Safe area properties • Experiments • Conclusions • Future extensions DOLAP 2009, Hong Kong, 6 Nov 2009
Top-k query Given • a relation R (id, x1, x2, x3) and • a query Q, sum(x1, x2, x3) Findk tuples with highest grades according to Q R Top-2 tuples DOLAP 2009, Hong Kong, 6 Nov 2009
Motivating Example Telecommunication Company • Executives see sale reports in PDAs • Given a relation • Region (id, name, today_traffic, yesterday_traffic, budget, ..) • a materialized view V of top-2 regions according to the query Q: 0.6*difftraffic + 0.4*budget V Region • Can a new top-k query (e.g. 0.5*difftraffic + 0.3*budget) be answered from V ? DOLAP 2009, Hong Kong, 6 Nov 2009
Problem definition • Given • a base relation R(ID, X, Y) • a materialized view V(ID, X, Y, s) that contains top-n tuples of the form(id, s) where s is defined as s = w (a·x + y) and w, a are positive parameters • a query Q (ID, X, Y, sQ) that requests for top k≤n tuples of the form (id, sQ) where sQ is defined as sQ = wQ (aQ·x + y) and wQ, aQ are positive parameters • Introduce • an algorithm that decides whether V by itself is suitable to answer Q and compute Q’s answer DOLAP 2009, Hong Kong, 6 Nov 2009
Related Work Gautam Das, Dimitrios Gunopulos, Nick Koudas, Dimitris Tsirogiannis : “Answering Top-k Queries Using Views”, VLDB ’06 • Answer top-k queryQby making use of ranking views V • LPTA in 2-steps • SelectViews (V, Q) • Selects efficient subset of views U for answering Q, • U contains the sorted lists over each attribute of the relation • AnswerQ from U • Linear programming adaptation of TA algorithm • Stopping condition : solution of linear program ≤ min (top-k) DOLAP 2009, Hong Kong, 6 Nov 2009
Related Work –Geometric Representation (0) • Assume • Relation R (ID, X, Y) • Two views Vu( id, Score1) and Vd( id, Score2) • Query Q( id, Score) • Scoring functions of the form Score = w ( a·x +y) • Depicted as y = a-1·x DOLAP 2009, Hong Kong, 6 Nov 2009
Related Work – Geometric Representation (1) • M : the kth tuple in Q • Stopping condition: sweeping line ( ) crosses position A1B • Any point below line AB has smaller score than M in regards to Q DOLAP 2009, Hong Kong, 6 Nov 2009
Related Work – Geometric Representation (2) • Stopping condition: intersection point S of sweeping lines ( , ) lies on line AB • Any point below line AB has smaller score than M in regards to Q DOLAP 2009, Hong Kong, 6 Nov 2009
Related Work • SelectViews (V,Q)is Data dependant • based on estimation of the last tuple of Q according to the data distribution • No theoretically established guarantees that the set of views will answer Q DOLAP 2009, Hong Kong, 6 Nov 2009
Contents • Motivation & Problem Definition • Overview of the Method • Theoretical guarantees • Strictness of theorem • Safe area properties • Experiments • Conclusions • Future extensions DOLAP 2009, Hong Kong, 6 Nov 2009
Overview of the method • Theoretical guarantees of Answering a query Q via a view VU • Theoretical guarantees are too strict • Parallelism of safe areas DOLAP 2009, Hong Kong, 6 Nov 2009
R Example • V top-3 with score x+2y • Q top-1 with score 2x+y DOLAP 2009, Hong Kong, 6 Nov 2009
Construction of safe area • VU(ID, X, Y, sU) • Containing topn tuples • with score sU=wU(aU·x+y) • tN the nth tuple in VU • LU :xNUyNU line perpendicular to VUpassing from tN and meeting axes X and Y • LQ:xNUyQ line perpendicular to Q passing from xNU DOLAP 2009, Hong Kong, 6 Nov 2009
Safe area • Safe area defined as the area “above” line LQ(shaded area) • Observations • Any tuple in safe area has score (in regards to Q) higher than any tuple outside the safe area • Tuples in safe area belong in both VUand Q DOLAP 2009, Hong Kong, 6 Nov 2009
Answering Q from VU • THEOREM 1 VUcan answer Q if safe area contains at least k tuples • Inverse does not always hold DOLAP 2009, Hong Kong, 6 Nov 2009
Overview of the method • Theoretical guarantees of Answering a query Q via a view VU • Theoretical guarantees are too strict • Parallelism of safe areas DOLAP 2009, Hong Kong, 6 Nov 2009
Answering Q from VU cont. • THEOREM 2 It is possible that VUcan answer Qif safe area contains less than k tuples • This holds when: areadefined by (yellow triangle) • lineLU, X-axisand • lineL1 producingthe lowest possible score for Qfrom tuples of VU Is void of tuples DOLAP 2009, Hong Kong, 6 Nov 2009
Algorithm TestViewSuitability • Three main steps • Step 1: Computesafearea (Q, V) • Step 2: Count tuples inVthat belong in thesafearea • Step 3: If there are more thank,then return (true) Else return(false) DOLAP 2009, Hong Kong, 6 Nov 2009
Overview of the method • Theoretical guarantees of Answering a query Q via a view VU • Theoretical guarantees are too strict • Parallelism of safe areas DOLAP 2009, Hong Kong, 6 Nov 2009
Combining two views • Lines LQU , LQDQ • characterizing the safe areas for VU and VD • LQU║LQD • safe area of one view (VU ) encompassed in safe area of the other view (VD) DOLAP 2009, Hong Kong, 6 Nov 2009
Contents • Motivation & Problem Definition • Overview of the Method • Theoretical guarantees • Strictness of theorem • Safe area properties • Experiments • Conclusions • Future extensions DOLAP 2009, Hong Kong, 6 Nov 2009
Experimental methodology • Test the following methods • Our algorithm • TA algorithm (it can guarantee view usability correctness) • For the following goals • Effectiveness • Number of queries answered by views • Efficiency • Time savings from usage of queries DOLAP 2009, Hong Kong, 6 Nov 2009
Experimental methodology • Experimental parameters: • Synthetic data sets: • Random data sets of different sizes for a relation of the form R (ID, X, Y) • Sequence of queries with random coefficients and result size k DOLAP 2009, Hong Kong, 6 Nov 2009
Effectiveness • Percentage of views used for 100 queries DOLAP 2009, Hong Kong, 6 Nov 2009
Effectiveness • Percentage of views used for different time spans DOLAP 2009, Hong Kong, 6 Nov 2009
Efficiency • Time savings from the usage of queries for different database sizes and requested results • Conflicting case • The number of stored results rises, while the savings drop • Due to the size of used memory • Memory allocation becomes slow • Probably one view is able to answer lot of queries • Savings increase for reasonable k’s of size 0.1% DOLAP 2009, Hong Kong, 6 Nov 2009
Contents • Motivation & Problem Definition • Overview of the Method • Theoretical guarantees • Strictness of theorem • Safe area properties • Experiments • Conclusions • Future extensions DOLAP 2009, Hong Kong, 6 Nov 2009
Conclusions • We have provided theoretical and algorithmic results for the problem of answering top-kqueries via materialized views • Theoretical – algorithmic results: • Theorem1: Theoretical guarantees for a view to answer a top-k query, • Theorem2: Strictness of Theorem1 • Parallelism of safe areas DOLAP 2009, Hong Kong, 6 Nov 2009
Contents • Motivation & Problem Definition • Overview of the Method • Theoretical guarantees • Strictness of theorem • Safe area properties • Experiments • Conclusions • Future extensions DOLAP 2009, Hong Kong, 6 Nov 2009
Future Work • Optimization in case of time and storage constraints • View Caching • Hierarchical structures for the set of views • Sorting techniques DOLAP 2009, Hong Kong, 6 Nov 2009
Thank you for your attention! … many thanks to our hosts! DOLAP 2009, Hong Kong, 6 Nov 2009