330 likes | 336 Views
This talk explores the interaction between multidimensional skylines and functional dependencies and how functional dependencies can help with full and partial materialization of skycubes.
E N D
On the interaction between multidimensional skylines and functional dependencies Sofian Maabout University of Bordeaux. CNRS Joint work with Nicolas Hanusse, Patrick Kamnang Wanko, Carlos Ordonez
Skyline query • O is in the skyline iff there is no other O’ better than O • Skyline={a, b, c, d} not dominated by any hotel • Intuitively, skyline points represent the best tradeoff
Multidimensional skylines • Users are allowed to ask queries using any combination of dimensions • CEO: Best hotels = offering a swimming pool and air conditionning • Student: Best hotels = cheapest and free wifi • Skycube = set of all possible skylines • How to optimize all these multidimensional skylines? • Precompute ALL of them Full Skycube • Precompute a SUBSET of them Partial Skycube
This talk • How functional dependencies can help full and partial materialization of skycubes
SkylineQueries and Data Quality • Discard records with low quality is one dimension of data cleaning • Compare tuples wrt their respective quality parameters • Best tuples = those with best tradeoff wrt quality parameters
SkylineQueries and Data Quality Zip City Phone Name
SkylineQueries and Data Quality t1, t3 and t4 involved in Zip City violation t1 and t2 involved in Phone Name violation t1’s salary is less precise than t2’s
SkylineQueries and Data Quality Sky(#FDs,SU)= {t4, t5}
Functional dependencies & multidimensional skylines A B BC A B A Theorem: If X Y thenSky(X) Sky(XY)
Closed subspaces • X is closed iff XA for every A not in X • The minimal FD’s satisfied by T are C is closed AB is not closed
Example sqs Red : closed subspace
Skycube computation If partial materialization, just stop here
Skycube computation Need of an efficient procedure
Mining ClosedSubspaces • Intuitive idea: • For every A, find the maximal X st X A • Every x X’sispotentiallyclosed • The intersection of these sets of x’s are the closedsubspaces We adapt N. Hanusse, SM: A parallel algorithm for computing borders. CIKM’11
Mining ClosedSubspaces Maximal subspaces not determining B
SubspaceClosure • Let X be a subspace. • Let Closed={Y | Y is closed} • Then, X+ = smallest Y Closed s.t X Y
ClosedSubspaces ABCD BCD ABC ABD ACD AD BC AB BD AC CD A B C D
Experiments • We versus other proposals for fully computing the skycube. • QGS & QGL : Lee et al. VLDBJ’14 and • BUS & TDS: Pei et al. TODS’06 • Orion: Raïssi et al. VLDB’10 • We versus closed skycubes: a losseless compression technique. Raïssi et al. VLDB’10 • Assess query evaluation time
Experiments: (1) compute all skylinesSynthetic data sets Independent Correlated Anti-correlated
Experiments: (1) Full SkycubeSynthetic data sets Speedup = execution time of algorithm X / execution time of our algorithm FMC
Experiments: (2) query optimization1000 random skyline queries • 0.31% out of the 2^20 queries are materialized. • 49 ms to answer 1K skyline queries from the materialized ones instead of • 99.92 seconds from the underlying data. • Speed up > 2000 23 23
Experiments: (3) comparison with closed skycubes • Identify equivalent skylines and store just one copy compression of the whole skylines set • E.g, Sky(C), Sky(D) and Sky(CD) are equivalent
Experiments: (3) comparison with closed skycubes Number of materialized skylines (time to find and materialize them) Synthetic correlated data: n=100K, d=20: MICS=20sec, Closed didn’t finish after 36 hours More details in N. Hanusse, SM, P. Kamnang Wanko, C. Ordonez: Skycube Materialization Using the Topmost skyline of Functional Dependencies. TODS’16
IncomparabilityDependencies • Definition: X ↬ Y iff t[X]=t’[X] t[Y] and t’[Y] incomparable • Theorem: Sky(X) satisfiesX ↬ Y Sky(X) Sky(XY) • Property: XY X ↬ Y
IncomparabilityDependencies FDs do not detect Sky(B) Sky(AB) while Sky(B) satisifes B ↬ A IncoDs detect that Sky(B) Sky(BC) because Sky(B) doesn’t satisfy B ↬ C
PrioritizedSkyline • Expression = Sky(AB & CD) • First computesSky(AB) • If t[AB] = t’[AB] and t Sky(AB), then t and t’ are comparedwrt C and D Kießling. Foundations of preferences in database systems. In VLDB’02, Chomicki et al. Preference elicitation in prioritized skyline queriesVLDBJ’12 Ciaccia et al. Output-sensitive Evaluation of Prioritized Skyline Queries. Sigmod’15
PrioritizedSkyline Sky(AB)= {t1, t2, t3, t4} t1[AB]= t2[AB] and t1 dominates t2 wrt CD Sky(AB & CD) = {t1, t3, t4}
PrioritizedSkyline • Let = X1 & … & Xi & … & Xm • If X1…Xi-1 X and X Xithen • ’ = X1& … & Xi\X & … & Xm • AB C Sky(AB & CD) Sky(AB & D)
Conclusion • Functionaldependencies are helpful for both full and partial skycubematerialization • Incomparabilitydependenciescharacterizeskyline inclusions • Semanticoptimization of prioritizedskylineswithFDs
Some Open questions • Is it possible to come up with a Chase like procedure for priotirized skylines semantic optimization? • What about Order dependencies ? • Incremental maintenance • Approximate skylines and approximate FDs • t[A] is preferred to s[A] iff s[A] – t[A] > • X Y iff t, s : t[X] ~ s[X] t[Y] ~ s[Y]
Thanks • Questions