250 likes | 340 Views
State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries. Samrat Ganguly Sudeept Bhatnagar NEC Laboratories America Inc. Princeton, NJ, USA. Song Wang Elke Rundensteiner Database Systems Research Group Worcester Polytechnic Institute Worcester, MA, USA.
E N D
State-Slice:New Paradigm of Multi-query Optimization ofWindow-based Stream Queries Samrat Ganguly Sudeept Bhatnagar NEC Laboratories America Inc. Princeton, NJ, USA. Song Wang Elke Rundensteiner Database Systems Research Group Worcester Polytechnic Institute Worcester, MA, USA.
Computation Sharing for Stream Processing Register Continuous Queries σ Streaming Data Streaming Result w1 П Agg w2 σ Agg σ w3 SPJA Query Network • New Challenges: • In-memory processing of stateful operators • Stateful operators with various window constraints 32nd VLDB Conference, Seoul, Korea, 2006
Buffer A Buffer B B[w] A[w] A B Window Constraints for Stateful Operators • Time-based sliding window constraints • Each tuple has a timestamp • Only tuples within W timeframe can form an output • Observations: • States in the operator dominate memory usage • State size is proportional to the input rate and window length • Join CPU cost is proportional to the state size 32nd VLDB Conference, Seoul, Korea, 2006
Q1 B[w1] A[w1] A B Q2 B[w2] A[w2] σA B A A Motivation Example Q1: SELECT A.* FROM Temperature A, Humidity B WHERE A.LocationId= B.LocationId WINDOW w1 min Q2: SELECT A.* FROM Temperature A, Humidity B WHERE A.LocationId= B.LocationId AND A.Value>Threshold WINDOW w2 min Let: w1<w2 • Observations: • State A[W1] overlaps with state A[W2] • State B[W1] overlaps with state B[W2] • Joined results of Q1 and Q2 overlap 32nd VLDB Conference, Seoul, Korea, 2006
Q1 B[w1] A[w1] Q2 A B σA Q2 B[w2] B[w2] A[w2] A[w2] σA B B A A Sharing with Selection Pull-up [CDF02, HFA+03] Q1 Q2 • Selection pull up • Using larger window (w2) Router σA |Ta-Tb | <W1 all R + A[w2] B[w2] A B • [CDF02]: J. Chen, D. J. DeWitt, and J. F. Naughton. Design and evaluation of alternative selection placement strategies in optimizing continuous queries. In ICDE’02. • [HFA+03]: M. A. Hammad, M. J. Franklin, W. G. Aref, and A. K. Elmagarmid. Scheduling for shared window joins over data streams. In VLDB’03. 32nd VLDB Conference, Seoul, Korea, 2006
Sharing with Selection Pull-up [CDF02, HFA+03] • Pros • Single Join Operator • Cons • Wasted Computation without Early Filtering • Wasted State Memory without Early Filtering • Per Output-Tuple Routing Cost 32nd VLDB Conference, Seoul, Korea, 2006
Q1 B[w1] A[w1] A B Q2 B[w2] A[w2] σA.Value>Threshold B A Stream Partition with Selection Pushdown [KFH04] Q2 Q1 Router all <W1 |Ta-Tb | Union U R A1 B1 A2 B2 + A[w1] B[w2] A[w2] B[w1] 1 2 <= Split S > Threshold B A • Split stream A by A.Value • Route shared join results • [KFH04]: S. Krishnamurthy, M. J. Franklin, J. M. Hellerstein, and G. Jacobson. The case for precision sharing. In VLDB’04. 32nd VLDB Conference, Seoul, Korea, 2006
Stream Partition with Selection Pushdown [KFH04] • Pros • Selection pushdown: no wasted Join Computation • Cons • Multiple Join Operators • Duplicated State Memory in Multiple Join Operators • Per Output-Tuple Routing Cost 32nd VLDB Conference, Seoul, Korea, 2006
State-Slice: New Sharing Paradigm • Key Ideas: • State-Slice Concept for Sliding Window Join • Pipelined Chain of Join Slices • Prospective Benefit: • Fine-grained Selection Push-down • Pipelined Join Operators • Avoiding Per-tuple Routing Cost 32nd VLDB Conference, Seoul, Korea, 2006
Joined-Result A Tuple Purged-A-Tuple State of Stream A: [w1, w2] Probe B Tuple Propagated-B-Tuple One-way State Sliced Window Join • Iower bound of sliding window: [w1,w2] • B tuple only probes A tuples that are “older” at least W1, but at most W2, than itself 32nd VLDB Conference, Seoul, Korea, 2006
Union U A Tuple State of Stream A: [0, w1] State of Stream A: [w1, w2] B Tuple Probe Probe J1 J2 Queue(s) The Chain of One-way State-Sliced Joins Joined-Result = • Split state memory into chain of joins • No overlap of state memory in chain of joins 32nd VLDB Conference, Seoul, Korea, 2006
From One-way to Two-way Binary Join Joined-Result U Union female A Tuple State of Stream A: [0, w1] State of Stream A: [w1, w2] male Queue(s) male B Tuple State of Stream B: [0, w1] State of Stream B: [w1, w2] female J2 J1 • Intuitively a combination of two one-way join • Two references for each A or B tuples • Male tuples are used to probe states • Female tuples are inserted and cross-purged to respective states 32nd VLDB Conference, Seoul, Korea, 2006
Q1 Q1 Q2 Q1 B[w1] A[w1] Union σA U B[w1] A[w1] A2 B2 A B s A B [W1,W2] [W1,W2] 2 Q2 σA B[w2] A[w2] B1 s [0,W1] [0,W1] 1 σA B A A B State-Sliced Join Chain: The Example + A1 • States of sliced joins in a chain are disjoint with each other Minimize State Memory Usage • Selection can be pushed down into middle of join chain Avoid Unnecessary Resource Waste • No routing step is needed Avoid Per Output-Tuple Routing Cost Completely 32nd VLDB Conference, Seoul, Korea, 2006
Summary: State-Sliced Join Chain • Pros: • Minimized Memory Usage • Reduced Routing Cost • No Need of Operator Synchronization in the Chain • Cons: • Stream traffic between pipelined joins • Purge cost 32nd VLDB Conference, Seoul, Korea, 2006
Union Union U U … QN Q1 Q2 Q3 Union U Union U … Union U … A s s s s 1 2 N 3 B [0,w1] [w1,w2] [w2,w3] [wN-1,wN] … QN Q1 Q2 Q3 σ1 σ2 σ3 σN Union σ’3 U … σ’2 σN σ’3 σ’1 σ’2 … s s s s A 1 2 N 3 B [0,w1] [w1,w2] [w2,w3] [wN-1,wN] Sharing via Chains: Memory-Optimal Chain • No Selection: • With Selection: 32nd VLDB Conference, Seoul, Korea, 2006
Union Union Union Union U U U U Q1 Q2 Q4 Q3 Q5 A s s s s s 1 2 4 3 5 B [0,w1] [w1,w2] [w2,w3] [w3,w4] [w4,w5] Mem-Optimal Chain CPU-Optimal Chain? • Overheads: • Too many operators may increase system context switch cost • Too many sliced states increase purging cost 32nd VLDB Conference, Seoul, Korea, 2006
Union Union Union Union U U U U … Qi Qj … Qi Qj … … ≥wj-1 … <wi … … s s … Router R i j |Ta-Tb | [wj-1,wj] [wi-1,wi] s … … i [wi-1,wj] Merging Sliced Joins • Tradeoff: • Gain from Merging • Reduce number of Join operators • Reduce extra purging cost • Loss from Merging • Introduce routing cost • Increase memory usage due to selection pullup • Cost Model for CPU Usage 32nd VLDB Conference, Seoul, Korea, 2006
Union Union Union U U U Q2 Q3 Q5 Q4 Q1 Router <w1 R |Ta-Tb | <w4 Router R |Ta-Tb | A s s s 1 2 3 B [0,w2] [w2,w3] [w3,w5] CPU-Opt. Chain: Search Space & Solution • Legend: • Vi: window start/end time • Vi toVj : one slice window v0 v1 v2 v3 v5 v4 • w0 • w2 • w1 • w3 • w4 • w5 Shortest path problem 32nd VLDB Conference, Seoul, Korea, 2006
CPU-Opt. Chain State Merge State Slice Selection PullUp Sharing Mem-Opt. Chain Summary: Mem-Opt. vs. CPU-Opt. Join Chain • Mem-Optimal: • Minimized Memory Usage • Higher System Overhead • Higher Purging Cost • CPU-Optimal: • Minimized CPU Usage • More Memory Usage if Selection is Pulled Up to Merge Slices. 32nd VLDB Conference, Seoul, Korea, 2006
Experimental WPI Stream Engine: CAPE Software Demonstration VLDB’04 32nd VLDB Conference, Seoul, Korea, 2006
Experiment Study 1: Memory Consumption 32nd VLDB Conference, Seoul, Korea, 2006
Experiment Study 2: Total Service Rate 32nd VLDB Conference, Seoul, Korea, 2006
Experiment Study 3: Mem-Opt. vs. CPU-Opt. Window Distributions Used for 12 Queries. Small-Large: 12 Queries Small-Large: 24 Queries 32nd VLDB Conference, Seoul, Korea, 2006
Conclusion • Pipelined state sliced join chain • Mem-Optimal chain construction • CPU-Optimal chain construction • Implemented in CAPE • Performance evaluation 32nd VLDB Conference, Seoul, Korea, 2006
Thank You! Visit CAPE Homepagehttp://davis.wpi.edu/dsrg/CAPE/index.html Supported by: CRI grant CNS 05-51584 32nd VLDB Conference, Seoul, Korea, 2006