260 likes | 451 Views
Parallel Distributed Processing of Constrained Skyline Queries by Filtering. Bin Cui , Hua Lu , Quanqing Xu , Lijiang Chen , Yafei Dai , Yongluan Zhou ICDE 08. Outline. Introduction Problem Definition Parallel Distributed Skyline Processing Experimental Study Conclusion.
E N D
Parallel Distributed Processing of ConstrainedSkyline Queries by Filtering Bin Cui , Hua Lu , QuanqingXu , Lijiang Chen , Yafei Dai , Yongluan Zhou ICDE08
Outline • Introduction • Problem Definition • Parallel Distributed Skyline Processing • Experimental Study • Conclusion
Introduction • Distributed computing environments is consisting of different computers. Sorg directly communicates with any other site(Computer). Each site(Computer) can compute at the same time (Parallel). • For instance, multiple stock information databases available at different places like New York Stock Exchange, London Stock Exchange, Tokyo Stock Exchange, etc. For each single stock, the agent needs to take into consideration multiple attributes. Therefore, a skyline query against those distributed databases will help the agent get those interesting stocks.
Problem Definition • Sorg directly communicates with any other site Si . • D: {p(2,6),q(2,4),r(3,3)} , q and r are not dominated. Skyline of D:{q , r }
Parallel Distributed Skyline Processing • Computing local skyline and rMBRs in parallel. • Parallel Distributed Query Execution • Merge.
(Cont.) • Computing local skyline and rMBRs in paralle • Green Block: MBR • Skyline: {(1,4),(3,3),(5,2)})
(Cont.) • Blue Block: skyline and rMBB (reduce MBB). rMBB only includes local skyline.{(1,4), (3,3),(5,2)}.
(Cont.) • Parallel Distributed Query Execution • Each site has a rMBB and local skyline set, and rMBB is represented by two points, the lower left corner rMBB.minand its uper right corner rMBB.max
(Cont.) rMBB 1 rMBB 1.min.DR rMBB 1.min rMBB 2.min.DR rMBB 2 rMBB 2.min
(Cont.) • Execution plan: partitioning : Incomparable • Partitioned into: • {{A},{B,C,D,E}{F,G}}
(Cont.) • Though B and D are incomparable, they are assigned to the same group with C and E, because either of them are not incomparable with C (and E).
(Cont.) • Pick filtering point: 1.Distance of each filtering point is max(MaxDist): dominating region of each filtering point has small overlap. 2. filtering points’ Dominating Region is max(MaxSum): dominating region of each filtering point is larege. 3.Random
(Cont.) • Assume 2 filtering point. • Max distance: choose (1,5),(6,2)
(Cont.) • Assume 2 filtering point. • Max Dominating Region: Choose (2,4) and (4,3) (1,5) (2,4):4 (1,5) (4,3):4 (1,5) (6,2):0 (2,4) (4,3):6 (2,4) (6,2):4 (4,3) (6,2):4 Max
(Cont.) • Computing local skylines and rMBBs in parallel.
(Cont.) • Partitioned into {{A,B},{C,D}}
(Cont.) • Assume 1 filtering point:A:pick(2,4)(Dominating Region: (1,5):0,(2,4):2,(4,3):0 (2,4) compares with B’s (2,4)dominates (2,7),(5,4) Skyline of Partition {A,B}: {(1,5),(2,4),(4,3)}
(Cont.) • Assume 1 filtering point:C:pick(6,2) (6,2) compares with D’s (6,2)dominates (8,2) (10,1) compares with D’s(10,0) (10,1) is dominated by (10,0) Skyline of Partition{C,D}: {(6,2),(10,0)}
(Cont.) • Merge Skyline of{A,B},{C,D}: {(1,5),(2,4),(4,3) ,(6,2),(10,0)}
Experimental Study • Independent Datasets
(Cont.) • AntiCorrelated Datasets
(Cont.) • NBA Dataset
(Cont.) • Performance with Different Numbers of Filtering Points
Conclusion • The Percentage of FIlterPoints:10% is better. • MaxSum is better than MaxDist and Random26