Adaptive Ordering of Pipelined Stream Filters

Adaptive Ordering of Pipelined Stream Filters Babu, Motwani, Munagala, Nishizawa, and Widom SIGMOD 2004 Jun 13-18, 2004 presented by Joshua Lee Mingzhu Wei Some slides are adapted from the presentation slides in sigmod2004

Presentation Overview • Motivation • A-GREEDY Algorithm • INDEPENDENT Algorithm • SWEEP Algorithm • LOCALSWAPS Algorithm • Adaptive Ordering of MultiwayJoins Over Streams • Conclusion

Motivation • Many queries on stream data involve processing commutative sets of filters • Each filter is a stateless predicate that indicates whether a tuple should be further processed • A tuple is output only if it satisfies all filters • Applying filters in different orders can affect the processing time of the query • The changing nature of stream data requires adaptive algorithms for filter ordering

Motivation: Example • Consider the following example usage of a stream processing system: • A network traffic monitoring application built on a stream engine is used to monitor the amount of common traffic flowing through four routers, R1, R2, R3, and R4 over the last ten minutes.

Motivation: Filter Ordering • If the filters are dependent, then the order of probing affects processing time • For instance, it could be known that most traffic through R1 comes from R3 or R4, but rarely from R2 • For incoming data on R1, probing W2 first would drop a tuple quickest, thus saving the time of probing W3 and W4

Motivation: Filter Ordering Challenges • Filter selectivities may be correlated • The candidate filter orderings to consider for N filters is N!, which increases quickly • Attributes of stream data and arrival characteristics can change over time

Motivation: Adaptive Filter Ordering Challenges • Run-time overhead incurred for continuous monitoring of statistics • Convergence properties required to prove an adaptive algorithm generates an approximation to correct answer • Speed of adaptivity in the face of changing data characteristics • There is a three-way tradeoff: Algorithms that adapt quickly and have good convergence properties incur a lot of run-time overhead

A-GREEDY Algorithm • The cost of a query plan ordering is • ti is the cost of processing one tuple in filter i • d(j|j-1) is the conditional probability that filter j will drop a tuple, given that no filter from 1 to j-1 dropped it

A-GREEDY Algorithm: Static Version and Greedy Invariant • Static greedy algorithm • Choose the filter with highest unconditional drop probability d(i|0) as the first filter • Choose the filter with the highest conditional drop probability d(j|1) as the next filter • Choose the filter with highest conditional drop probability d(k|2) as the next filter • And so on • Greedy Invariant occurs when a filter ordering satisfies the following

A-GREEDY Algorithm: Profiler • Two logical components of A-Greedy: • profiler: continuously collects and maintains statistics about filter selectivities and processing costs • Reoptimizer: detects and corrects violations of the GI in the current filter ordering • Challenge faced by the profiler: n2n-1conditional selectivities for n filters • Solution: profile of recently dropped tuples

A-GREEDY Algorithm: Profiler • Profile: a sliding window of profile tuples • A profile tuple contains n boolean attributes b1,…,bn corresponding to n filters • Dropped tuples are sampled with some probability p • If a tuple e is chosen for profiling, it will be tested by all remaining filters • A new profile tuple e: bi = 1 if Fi drop e, otherwise bi=0

A-GREEDY Algorithm: Reoptimizer • The reoptimizer maintains an ordering O such that O satisfies the GI • How does the reoptimizer use profile to derive estimates of conditional selectivities? • It incrementally maintains a view over the profile window

A-GREEDY Algorithm: Reoptimizer • Matrix: View over the profile window: n×n • V[i, j]: number of tuples in the profile window that were dropped by Ff(j) but not dropped by Ff (1)Ff (2),…, F f (i-1) V[i, j] is proportional to d (j|i-1)

A-GREEDY Algorithm: Run-time Overhead and Adaptivity • Profile-tuple creation: needs additional n-I evaluations for a tuple dropped by Ff(i) . • Profile-window maintenance: insertion and deletion of profile tuples, averages of filter processing times • Matrix-view update: every update would cause access to up to n2/4 entries • Detection and correction of GI violations • Good convergence properties • Fast adaptivity

Variants of A Greedy • Can we sacrifice some of A-Greedy’s convergence properties or adaptivity speed to reduce its run-time overhead? • Three variants of A greedy are proposed

SWEEP Algorithm • Proceeds in stages • During one stage, only checks for GI violations involving the filter at one specific position j • Do not need to maintain entire matrix view: bf (1) ,…,b f (j) • For each profiled tuple, needs to additionally evaluate F f (j) only • By rotating j over 2,…,n, eventually detects and corrects all GI violations • Same convergence properties as A-Greedy • Reduced view and need for additional evaluations less overhead • slower adaptivity: only 1 filter is profiled in each stage

INDEPENDENT Algorithm • Assumes filters are independent we only need to maintain estimates of unconditional selectivities • Lower view maintenance overhead • Fast adaptivity • Convergence: if assumption holds, optimal. Otherwise, can be O(n) times worse than GI orderings

LOCALSWAP Algorithm • Monitors “local” violations only, i.e., violations involving adjacent filters • LocalSwaps detects situations where a swap between adjacent filters • Only need to maintain two diagonals of the view • For each profiled tuple dropped by Ff(i), only needs to additionally evaluate F f(i+1)

Adaptive Ordering of MultiwayJoins Over Streams • MJoins maintain an ordering of {S0, S1,…, Sn-1}-{Si} for each stream Si • New tuples arriving from Si is joined with other stream windows in that order • Two-phase join algorithm • Drop-probing phase: the new tuple is used to probe all other windows in the specified order. If any window drops it, no further processing will be needed for it • output-generation phase: if no window drops the tuple, proceeds as conventional MJoins • Drop probing resembles pipelined filters • A-Greedy and its variants can be used to determine the orderings

Conclusion • A-Greedy has good convergence properties and fast adaptivity, but incurs significant run-time overhead • Three variants of A-Greedy are proposed, each lying at a different points along the tradeoff spectrum among convergence, runtime overhead, and speed of adaptivity

Adaptive Ordering of Pipelined Stream Filters