260 likes | 411 Views
Refreshing the Sky: The Compressed Skycube with Efficient Support for Frequent Updates. Tian Xia and Donghui Zhang Northeastern University. Dist. To Beach. t 2. t 4. t 3. t 7. 1 2 3 4 5 6 7 8. t 5. t 1. t 6. 1 2 3 4 5 6 7 8 9.
E N D
Refreshing the Sky:The Compressed Skycube with Efficient Support for Frequent Updates Tian Xia and Donghui Zhang Northeastern University SIGMOD 2006, Chicago, IL
Dist. To Beach t2 t4 t3 t7 1 2 3 4 5 6 7 8 t5 t1 t6 1 2 3 4 5 6 7 8 9 Price Skyline Query • A classic example revisited. Hotels in Nassau Price Dist. To Beach t1 3 2 t2 4 7 t3 9 5 t4 4 6 t5 2 3 t6 6 1 t7 1 4 • The smaller, the better. • If no object is better than ti in all dimensions, ti is a skyline object. SIGMOD 2006, Chicago, IL
u3 u3 t2 t3 t4 t3 t1 t7 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 t5 t5 t2 t6 t7 t4 t6 t1 t5 u1 u4 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 Skyline in u1, u3 Skyline in u3, u4 Subspace Skyline Query • What if users may issue skyline queries based on arbitrary subsets of dimensions? • Results of subspace skylines can be very different! u1 u2 u3 u4 t1 3 4 2 5 t2 4 6 7 2 t3 9 7 5 6 t4 4 3 6 1 t5 2 2 3 1 t6 6 1 1 3 t7 1 3 4 1 Objects of 4-dimensions SIGMOD 2006, Chicago, IL
Full-space skyline u1u2u3u4 u1u2 u3 u1u2u4 u1u3u4 u2u3u4 u1u2 u1u3 u1u4 u2u3 u2u4 u3u4 u1 u2 u3 u4 Skycube (Yuan, et al., VLDB 2005) • A d-dimensional space contains 2d-1 subspaces, and the subspaces of various users’ interests are unpredictable. • On-the-fly computation does not achieve fast response time for an online system. • Skycube is the collection of all subspace skyline results. SIGMOD 2006, Chicago, IL
Our Motivations (1) • In many scenarios of the subspace skyline applications, the data are changing constantly. • Example: In an online hotel-booking system, room prices change due to the availability. • The previous Skycube paper focused only on the initial computation of the Skycube. • A straightforward re-computation upon each update is extremely inefficient! SIGMOD 2006, Chicago, IL
Our Motivations (2) • The complete Skycube contains a huge number of duplicates. • Drawback 1: Waste of storage. • Drawback 2: Difficult to maintain. • The large size of the Skycube and the large number of duplications cause the update of the Skycube inherently expensive. SIGMOD 2006, Chicago, IL
Our Motivations (2) Cuboid Skyline u1 t7 u2 t6 u1 u2 u3 u4 u3 t6 t1 3 4 2 5 u4 t5 , t7 , t4 t5 2 2 3 1 u1 , u2 t5 , t6, t7 , t9 Corresponding Skycube t6 6 1 1 3 u1 , u3 t1 , t5 , t6, t7 , t9 t7 1 3 4 1 u1 , u4 t7 t4 4 3 6 1 u2 , u3 t6 t9 2 2 3 7 u2 , u4 t5 , t6 t2 4 6 7 2 u3 , u4 t5 , t6 t3 9 7 5 6 u1 , u2 , u3 t1 , t5 , t6, t7 , t9 t8 6 5 3 8 u1 , u2 , u4 t5 , t6, t7 u1 , u3 , u4 t1 , t5 , t6, t7 Full-space skyline objects u2 , u3 , u4 t5 , t6 Other skyline objects (not in full-space) u1 , u2 , u3 , u4 t1 , t5 , t6, t7 SIGMOD 2006, Chicago, IL
Our Motivations (3) – Tradeoffs • The complete Skycube: • Fast query response. • High update cost. • On-the-fly computation: • Slow query response. • Low update cost. SIGMOD 2006, Chicago, IL
Our Solution – The Compressed Skycube • We propose a new storage model for the Skycube, which greatly reduces the storage. • We propose a new object-aware update scheme, which avoids unnecessary disk access and cuboids' computation. • By taking advantages of the compact structure and our update scheme, the Compressed Skycube achieves both fast query response and veryefficient update. SIGMOD 2006, Chicago, IL
Outline • Background and Motivations • The Compressed Skycube • Experimental Results • Conclusions SIGMOD 2006, Chicago, IL
Minimum Subspace • DEFINITION: Given an object t, the minimum subspaces of t, denoted as mss(t), satisfies the following two conditions: • For any subspace U in mss(t), t is in the skyline of U; • And for any subspace V U, t is not in the skyline of V. SIGMOD 2006, Chicago, IL
Minimum Subspaces t1u1, u3 t5 u4, u1, u2, u1, u3 t6u2, u3 t7u1, u4 t4u4 t9u1, u2, u1, u3 Minimum Subspace Cuboid Skyline u1 t7 • Object t6 appears in the skylines of 12cuboids. • The minimum subspaces of t6 are only 2cuboids. u2 t6 u3 t6 u4 t5 , t7 , t4 u1 , u2 t5 , t6, t7 , t9 u1 , u3 t1 , t5 , t6, t7 , t9 u1 , u4 t7 u2 , u3 t6 u2 , u4 t5 , t6 u3 , u4 t5 , t6 u1 , u2 , u3 t1 , t5 , t6, t7 , t9 u1 , u2 , u4 t5 , t6, t7 u1 , u3 , u4 t1 , t5 , t6, t7 u2 , u3 , u4 t5 , t6 u1 , u2 , u3 , u4 t1 , t5 , t6, t7 SIGMOD 2006, Chicago, IL
Cuboids of CSC Minimum Subspaces Cuboid Skyline t1u1, u3 u1 t7 t5 u4, u1, u2, u1, u3 u2 t6 t6u2, u3 u3 t6 t7u1, u4 u4 t5 , t7 , t4 t4u4 u1 , u2 t5 , t9 t9u1, u2, u1, u3 u1 , u3 t1 , t5 , t9 The Compressed Skycube • DEFINITION: The Compressed Skycube (CSC) consists of non-empty cuboids U, such that an object t is stored in a cuboid U if and only if U mss(t). SIGMOD 2006, Chicago, IL
t6 t5 Querying CSC • Overview example: query space Uq = u2, u3 , u4 Cuboid Skyline u1 u2 u3 u4 u1 t7 t1 3 4 2 5 u2 t6 t5 2 2 3 1 u3 t6 t6 6 1 1 3 u4 t5 , t7 , t4 t7 1 3 4 1 u1 , u2 t5 , t9 t4 4 3 6 1 u1 , u3 t1 , t5 , t9 t9 2 2 3 7 • Search within the cuboids which are the subsets of Uq. • Compare the objects only within a candidate cuboid to filter out false positives. SIGMOD 2006, Chicago, IL
Cuboid Skyline u1 t7 u2 t6 u3 t6 u4 t5 , t7 , t4 u1 , u2 t5 , t9 u1 , u3 t1 , t5 , t9 Querying CSC • LEMMA 1: Given a query space Uq and an object t, if for any subspace Ui in mss(t), UiUq, then t is not in the skyline of Uq. • Lemma 1 implies two important facts: • Only the existing cuboids that Uq need to be searched. • No other cuboids need to be accessed or computed in the query process. • Example: Uq = u2, u3 , u4 , and t9can be safely pruned. u1 u2 u3 u4 t1 3 4 2 5 t5 2 2 3 1 t6 6 1 1 3 t7 1 3 4 1 t4 4 3 6 1 t9 2 2 3 7 SIGMOD 2006, Chicago, IL
Querying CSC • LEMMA 2 (Local Comparison): To check a candidate t in a cuboid V Uq, we only need to compare t with the objects within the same cuboid. • Example: Uq = u2, u3 , u4 , and the skyline of Uq is {t5, t6}. • No comparison is needed for t6. And t5, t7, t4 are only locally compared to each other. Cuboid Skyline u1 u2 u3 u4 u1 t7 t1 3 4 2 5 u2 t6 t5 2 2 3 1 u3 t6 t6 6 1 1 3 u4 t5 , t7 , t4 t7 1 3 4 1 u1 , u2 t5 , t9 t4 4 3 6 1 u1 , u3 t1 , t5 , t9 t9 2 2 3 7 SIGMOD 2006, Chicago, IL
No cuboid computation. Existing CSC objects are not changed. tnew sky(D) No dataset (disk) access t sky(D) tnew sky(D) Existing CSC objects may be removed or move to other cuboids. May access dataset (disk) Insert new skyline objects tsky(D) Updating CSC • Intuitions: • Not all updates of objects need to access the dataset. • Not all updates of objects need to re-compute the skyline of a cuboid. • These intuitions are supported by our theorems. • D: full-space; sky(D): full-space skyline. • t: object before update; tnew: object after update. • Considering the proportion of full-space skyline objects in the whole dataset, the above covers most cases of the updates SIGMOD 2006, Chicago, IL
t9 2 2 3 7 t9u1, u2, u1, u3 Updating CSC • tsky(D) and tnewsky(D) • Key points: • Compare tnew with existing full-space skyline objects (sky(D)). • mss(tnew) is determined by any dominating object in sky(D). Cuboid Skyline Minimum Subspaces u1 u2 u3 u4 u1 t7 t1u1, u3 t1 3 4 2 5 u2 t6 t5 u4, u1, u2, u1, u3 t5 2 2 3 1 u3 t6 t6u2, u3 t6 6 1 1 3 u4 t5 , t7 , t4 t7u1, u4 t7 1 3 4 1 u1 , u2 t5, t9 t4u4 t4 4 3 6 1 u1 , u3 t1 , t5, t9 SIGMOD 2006, Chicago, IL
Minimum Subspaces t1u1, u3 t5 u4, u1, u2, u1, u3 t6u2, u3 t7u1, u4 t10 1 3 1 3 t4u4 t9u1, u2, u1, u3 t10u1, u3 Updating CSC • tsky(D) and tnewsky(D) • Key points: • Existing objects may be removed or move to super-set cuboids. • Determine mss(tnew) is not intuitive in this case. A new recursion-based approach is proposed to avoid unnecessary computations. Cuboid Skyline u1 u2 u3 u4 u1 t7 , t10 t1 3 4 2 5 u2 t6 t5 2 2 3 1 u3 t6 , t10 t6 6 1 1 3 u4 t5 , t7 , t4 t7 1 3 4 1 u1 , u2 t5 , t9 t4 4 3 6 1 u1 , u3 t1 , t5 , t9 t9 2 2 3 7 SIGMOD 2006, Chicago, IL
Outline • Background and Motivations • The Compressed Skycube • Experimental Results • Conclusions SIGMOD 2006, Chicago, IL
Storage Comparison • Settings: • Dimensionality (Full-space) – [4, 8]; default = 6. • Cardinality – [100K, 500K]; default = 300K. • Distribution: Independent, Corr, Anti-Corr. Logarithmic Scale! SIGMOD 2006, Chicago, IL
Query Performance • Queries on the complete skycube do not involve computations, their time is not reported. • This set of experiments verifies that the query response of the CSC is indeed very fast. SIGMOD 2006, Chicago, IL
Update Performance • General update • Updates are from random objects in the whole dataset. • Skycube is re-computed from scratch. • Full-space skyline update. • Updates are from random full-space skyline objects. • Skycube is re-computed from existing skylines plus new candidates. SIGMOD 2006, Chicago, IL
Outline • Background and Motivations • The Compressed Skycube • Experimental Results • Conclusions SIGMOD 2006, Chicago, IL
Conclusions • We addressed the update support of the skycube in dynamic environment, and provided an efficient and scalable solution for online skyline query system. • We proposed a compact structure, the compressed skycube, with about 10% disk space of the complete skycube and fast query response. • We proposed an object-aware update scheme, such that different updates trigger different amount of computation. Our CSC outperforms the Skycube in update by several orders of magnitude. SIGMOD 2006, Chicago, IL
Thank you! Tian Xia and Donghui Zhang. Refreshing the Sky: the Compressed Skycube with Efficient Support for Frequent Updates. SIGMOD 2006. Questions? SIGMOD 2006, Chicago, IL