Parallel Skyline Queries

Parallel Skyline Queries FotoAfrati ParaschosKoutris Dan Suciu Jeffrey Ullman University of Washington

What is The Skyline? • A d-dimensional set R • A point x dominatesx’ if forall k:x(k) ≤ x’(k) • The skyline of Rare all non-dominated points of R skyline domination

Contributions • We design algorithms for Skyline Queries based on two parallel models: • MP: perfectload balancing [Koutris, Suciu ‘11] • GMP: weakerload balancing [Afrati, Ullman ’10] • We present 3 algorithms with theoretical guarantees for: • #synchronization steps • load balance

Previous Approaches • Several efficient algorithms for skyline queries exist in the literature • Parallel algorithms use various partitionings: • Grid-based partitioning [WZFZAA ’06] • Random partitioning [CRZ ’07] • Angle-based space partitioning [VDK ’08] • Hyperplaneprojections [KYZ ’11] • Previous approaches typically require a logarithmic number of communication steps: our algorithms achieve 1 or 2 steps

Massively Parallel Models • P servers: R partitioned into R1,R2,…, RP • n = |R| • The algorithm alternates between communication and computation steps • MP model: each node holds O(n/P) data • GMP model: each node holds O(Pε* n/P) where 0 ≤ε< 1 • ε =0 : GMP = MP • ε =1 : GMP = sequential computation in one node

An Example • How do we compute set intersection in one step in the MP model? • Hash each value x (from R or S) to a server Intersection Q(x):-R(x),S(x) Communication Phase send tuple R(x) to server @h(x) send tuple S(x) to server @h(x) Computation Phase output a tuple only if it occurs twice

The Broadcast Step • In addition to regular communication steps, we allow broadcast steps: • the data exchanged is independent of n • Known results: • Q(x,y)=R(x),S(x,y) can be computed in 1 MP step iff a broadcast step is allowed [Koutris, Suciu ‘11] • Q(x,y)=R(x),S(x,y),T(y) cannot be computed in 1 MP step [Koutris, Suciu ‘11] , but can be in 1 GMP step with ε=1/2 [Afrati, Ullman ‘10]

Outline of our Approach • Broadcast • Grid-based partitioning into cells • Pre-processing the cells to compute the relaxed skyline • Communication • Careful distribution of the cells (with their data) to the servers • Computation: • Localcomputation of the skyline at each server

Algorithm: Local: each server evenly partitions its data to M buckets Broadcast: servers exchange MxP partition points Local: each server picks every P-thvalue as partition point Bucketizing • Partition Rinto M buckets across some dimension, such that each partition contains O(n/M) points • Equivalently, compute (M+1) partition points: -∞ = b0 , b1 , … , bM = +∞ M=P or P1/(d-1) bucketize across dimension 1 bucketize across dimension 2

Cells • A cell is an intersection of buckets from all dimensions • Every point belongs in exactly one cell • Every cell holds O(n/P) data (and not O(n/Pd) !!) In each cell, we can keep only candidates forskyline points candidate rejected

Cells • We are interested in the non-empty cells • Any cell that is strictly dominated by another does not contribute to the skyline no points belong in the final skyline strict domination domination

Relaxed Skyline of Cells • The relaxed skyline consists of the non-empty cells that are not strictly dominated by non-empty cells • We focus on the relaxed skyline of non-empty cells relaxed skyline skyline

On Relaxed Skylines • To compute the skyline points of a cell B, we need to compare with cells that: • belong in the relaxed skyline • weakly dominate B (have one common coordinate) cell B

A NaïveApproach • Try the following: • Partition into P buckets (M=P) • Allocatecells in the relaxed skyline to servers + cells that weakly dominate them: O(n/P) data per cell • Locally compute the skyline points • This works if the relaxed skyline is small • But the relaxed skyline can have as many as Ω(Pd-1)cells for dimension d

A 1-step Algorithm • Choose a coarserbucketization (<P buckets) • This gives a weak load-balancedalgorithmwith maximum load of O( (n/P) P(d-2)/(d-1) ) • ε = (d-2)/(d-1) (ε=0 implies GMP=MP) Corollary. For d=2 dimensions, we obtain a perfectly load balanced algorithm for MP

A 2-step Algorithm • Step 1: group the cells in the relaxed skyline by bucket for every dimension Server 1 Server 2 … … Server 2 Server 1

A 2-step Algorithm • For each bucket, compute the local skyline • A point is a skyline point iff it is a local skyline point in every one of the d buckets • Step 2: intersect the local skylines This point is in the skyline of the y-bucket, but not the x-bucket x-bucket y-bucket

A 1-Step Algorithm for 3D Key idea: to reject this point, we only need the minimum x-coordinatefrom cell B cell B

A 1-Step Algorithm for 3D • The observation reduces the number of points that need to be communicated • With smart partitioning, we can achieve perfect load-balancein 1 step • However, the property holds only for 2 and 3 dimensions

Conlusion 3 algorithms for Skyline Queries: • 2 step + perfect load balance • 1 step + some replication • 1 step + perfect load balance for d < 4 Open Questions • Can we compute the skyline in 1 step with perfect load balance for >3 dimensions? • A more general question: what classes of queries can we compute in the MP model with perfect or weaker load balance guarantees?

Thank you!

Interior Cells • Two cells are co-linear if they share exactly two coordinates • A cell i is interior if every colinear cell in Sr(J) belongs in the same hyperplane as i. Else, it is a corner cell. • Interior cells are easy to handle: we can send the whole plane to a single processor

Corner Cells • We group the corner cells into lines • Border cells are the minimal/maximal cells of each line • Fact: lines meet only on border cells • Grouping: each line is a group, a cell is assigned to the lexicographically first line it belongs to

Assigning the groups • We have two ways to assign groups to servers • The first is deterministic and greedily assigns a group to any server that is not overloaded (M=P) • The second is randomized and sends each group randomly to some server (M = P log P)

About the MP model • [KS11] A dichotomy result on Conjunctive Queries that can be computed in 1 step with perfect load balancing • Easy Queries: • Q(x,y,z) :- R(x,y) , S(y,z) • Q(x,y,z,) :- R(x), S(x,y), T(x,y,z) • Hard Queries: • Q(x,y) :- R(x), S(x,y), T(y) • Q(x,y) :- R(x), S(x), T(y)

Parallel Skyline Queries

Parallel Skyline Queries

Presentation Transcript

Progressive Computation of Constrained Subspace Skyline Queries

Skyline

Skyline Queries Against Mobile Lightweight Devices in MANETs

Customizable Parallel Execution of Scientific Stream Queries

Parallel Computation of Skyline Queries Verification

Caching Dynamic Skyline Queries

Online Interval Skyline Queries on Time Series

Efficient Computation of Reverse Skyline Queries

The Spatial Skyline Queries

Skyline Heights

Online Interval Skyline Queries on Time Series

Parallel Skyline Computation on Multicore Architectures

Efficient Processing of Metric Skyline Queries

Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

Optimal Planar Orthogonal Skyline Counting Queries

Answering Metric Skyline Queries by PM-tree

Parallel Distributed Processing of Constrained Skyline Queries by Filtering

Skyline College

Skyline High

Skyline Queries Against Mobile Lightweight Devices in MANETs

SKYLINE

Efficient and Enhanced Proxy Re Encryption Algorithm for Skyline Queries