370 likes | 478 Views
CS 410/510 Data Streams Lecture 15: How Soccer Players Would do Stream Joins & Query-Aware Partitioning for Monitoring Massive Network Data Streams. Kristin Tufte, David Maier . How Soccer Players Would do Stream Joins. Handshake Join Evaluate window-based stream joins Highly parallelizable
E N D
CS 410/510Data StreamsLecture 15: How Soccer Players Would do Stream Joins & Query-Aware Partitioning for Monitoring Massive Network Data Streams Kristin Tufte, David Maier Data Streams: Lecture 15
How Soccer Players Would do Stream Joins • Handshake Join • Evaluate window-based stream joins • Highly parallelizable • Implementation on multi-core machine and FPGA • Previous stream join execution strategies • Sequential execution based on operational semantics Data Streams: Lecture 15
Let’s talk about stream joins • Join window of R with window of S • Focus on sliding windows here • Scan, Insert, Invalidate • How might I parallelize? • Partition and replicate • Time-based windows vs. tuple-based windows Figure Credit: How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011 Data Streams: Lecture 15
So, Handshake Join… • Parallelization needs partitioning; possibly replication • Needs central coordination • Entering tuple pushes oldest tuple out • No central coordination • Same semantics • May introduce disorder Figure Credit : How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011 Handshake Join Stream Join Input B Input A Traditional Stream Join Data Streams: Lecture 15
Parallelization • Each core gets a segment of each window • Data flow: act locally on new data arrival and passing on data • Good for shared-nothingsetups • Simple communication – interact with neighbors; avoid bottlenecks Data Streams: Lecture 15 Figure Credit: How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011
Parallelization - Observations • Parallelizes tuple-based windows and non equi-join predicates • As written, compares all tuples – could hash at each node to optimize • Note data transfer costs between cores and each tuple is processed at each core • Soccer players have short arms, hardware is NUMA Data Streams: Lecture 15 Figure Credit: How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011
Scalability • Data flow + point-to-point communication • Add’l cores: larger window sizes or reduce workload per core • “directly turn any degree of parallelism into higher throughput or larger supported window sizes” • “can trivially be scaled up to handle larger join windows, higher throughput rates, or more compute-intensive join predicates” Data Streams: Lecture 15 Figure Credit: How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011
Encountering Tuples • Item in either window, encounters all current times in the other window • Immediate scan strategy • Flexible segment boundaries (cores) • Other local implementations Figure : How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011 Data Streams: Lecture 15 Figure Credit: How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011
Handshake Join with Message Passing • Lock-step processing (tuple-based windows) • FIFO queues with message passing • Missed join-pair Data Streams: Lecture 15
Two-phase forwarding • Asymmetric synchronization (replication on one core only) • Keep copies of forwarded tuples until ack received • Ack for s4 must be processed between r5 and r6 Data Streams: Lecture 15
Load Balancing & Synchronization • Even distribution not needed for correctness • Maintain mostly even-sized local S windows • Synch at pipeline ends to manage windows Data Streams: Lecture 15
FPGA Implementation • Tuple-based windows that fit into memory • Common clock signal; lock-step processing • Nested-loops join processing Data Streams: Lecture 15
Performance Scalability on Multi-Core CPU Scalability on FPGAs; 8 tuples/window Data Streams: Lecture 15
Before we move on… • Soccer joins focuses on sliding windows • How would their algorithm and implementation work for tumbling windows? • What if we did tumbling windows only? Data Streams: Lecture 15
Query-Aware Partitioning for Monitoring Massive Network Data Streams • OC-786 Networks • 100 million packets/sec • 2x40 Gbit/sec • Query plan partitioning • Issues: “heavy” operators, non-uniform resource consumption • Data stream partitioning Data Streams: Lecture 15
Let’s partition the data… SELECT time, srcIP, destIP, srcPrt, destPort, COUNT(*), SUM(len), MIN(timestamp), MAX(timestamp) ... FROM TCP GROUP BY time, srcIP, destIP, srcPort, destPort • Computes packet summaries between src and dest for network monitoring • Round robin partitioning -> worst case a single flow results in n partial flows Data Streams: Lecture 15
And, we might want a HAVING… SELECT time, srcIP, destIP, srcPrt, destPort, COUNT(*), SUM(len), MIN(timestamp), MAX(timestamp) ... FROM TCP GROUP BY time, srcIP, destIP, srcPort, destPort HAVING OR_AGGR(flags) = ATTACK_PATTERN • Round robin partitioning -> no node can apply HAVING • CPU and network load on final aggregator is high Data Streams: Lecture 15
So, let’s partition better… SELECT time, srcIP, destIP, srcPrt, destPort, COUNT(*), SUM(len), MIN(timestamp), MAX(timestamp) ... FROM TCP GROUP BY time, srcIP, destIP, srcPort, destPort HAVING OR_AGGR(flags) = ATTACK_PATTERN • What about partitioning on : srcIP, destIP, srcPort, destPort (partition flows)? • Yeah! Nodes can compute and apply HAVING locally … • But, what if I have more than one query? Data Streams: Lecture 15
But I need to run lots of queries… • Large number of simultaneous queries are common (i.e. 50) • Subqueries place different requirements on partitioning • Dynamic repartitioning for each query? • That’s what the parallel DBs do… • Splitting 80 Gbit/sec -> specialized network hardware • Partition stream once and only once… Data Streams: Lecture 15
Partitioning Limitations • Program partitioning in FPGAs • TCP fields (src, dest IP) - ok • Fields from HTTP – not ok • Can’t re-partition every time the workload changes Data Streams: Lecture 15
Query-Aware Partitioning • Analysis framework • Determine optimal partitioning • Partition-aware distributed query optimizer • Takes advantage of existing partitions Data Streams: Lecture 15
Query-Aware Partitioning • Analysis framework • Determine optimal partitioning • Partition-aware distributed query optimizer • Takes advantage of existing partitions • Compatible partitioning • Maximizes amount of data reduction done locally • Formal definition of compatible partitioning • Compatible partitioning – aggregations & joins Data Streams: Lecture 15
GS Uses Tumbling Windows (only) SELECT tb, srcIP, destIP, sum(len) FROM PKT GROUP BY time/60 as tb, srcIP, destIP • Time attribute is ordered (increasing) SELECT time, PKT1.srcIp, PKT1.destIP, PKT1.len + PKT2.len FROM PKT1 JOIN PKT2 WHERE PKT1.time = PKT2.time and PKT1.srcIP = PKT2.srcIP and PKT1.destIP = PKT2.destIP Data Streams: Lecture 15
Query Example flows: SELECT tb, srcIP, destIP, COUNT(*) as cnt FROM TCP GROUP BY time/60 as tb, srcIP, destIP heavy_flows: SELECT tb, srcIP, max(cnt) as max_cnt FROM flows GROUP BY tb, srcIP flow_pairs: SELECT S1.tb, S1.srcIP, S1.max_cnt, S2.max_cnt FROM heavy_flows S1, heavy_flows S2 WHERE S1.srcIP = S2.srcIP and S1.tb = S2.tb+1 Data Streams: Lecture 15 Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008
Query Example flows: SELECT tb, srcIP, destIP, COUNT(*) as cnt FROM TCP GROUP BY time/60 as tb, srcIP, destIP • Which partitioning scheme is optimal for each of the queries? heavy_flows: SELECT tb, srcIP, max(cnt) as max_cnt FROM flows GROUP BY tb, srcIP flow_pairs: SELECT S1.tb, S1.srcIP, S1.max_cnt, S2.max_cnt FROM heavy_flows S1, heavy_flows S2 WHERE S1.srcIP = S2.srcIP and S1.tb = S2.tb+1 Data Streams: Lecture 15 Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008
Query Example flows: SELECT tb, srcIP, destIP, COUNT(*) as cnt FROM TCP GROUP BY time/60 as tb, srcIP, destIP • How to reconcile potentially conflicting partitioning requirements? heavy_flows: SELECT tb, srcIP, max(cnt) as max_cnt FROM flows GROUP BY tb, srcIP flow_pairs: SELECT S1.tb, S1.srcIP, S1.max_cnt, S2.max_cnt FROM heavy_flows S1, heavy_flows S2 WHERE S1.srcIP = S2.srcIP and S1.tb = S2.tb+1 Data Streams: Lecture 15 Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008
Query Example flows: SELECT tb, srcIP, destIP, COUNT(*) as cnt FROM TCP GROUP BY time/60 as tb, srcIP, destIP • How can we use information about existing partitioning in a distributed query optimizer? heavy_flows: SELECT tb, srcIP, max(cnt) as max_cnt FROM flows GROUP BY tb, srcIP flow_pairs: SELECT S1.tb, S1.srcIP, S1.max_cnt, S2.max_cnt FROM heavy_flows S1, heavy_flows S2 WHERE S1.srcIP = S2.srcIP and S1.tb = S2.tb+1 Data Streams: Lecture 15 Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008
What if we could only partition on destIP? Data Streams: Lecture 15 Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008
Partition compatibility SELECT tb, srcIP, destIP, sum(len) FROM PKT GROUP BY time/60 as tb, srcIP, destIP • Partitioning on (time/60, srcIP, destIP) -> execute aggregation locally then union • (srcIP, destIP, srcPort, destPort) can’t aggregate locally Data Streams: Lecture 15
Partition compatibility SELECT tb, srcIP, destIP, sum(len) FROM PKT GROUP BY time/60 as tb, srcIP, destIP • Partitioning on (time/60, srcIP, destIP) -> execute aggregation locally then union • (srcIP, destIP, srcPort, destPort) can’t aggregate locally • P is Compatible with Q if for every time window, the output of Q is equal to a stream union of the output of Q running on partitions produced by P Data Streams: Lecture 15
Should we partition on temporal attributes? • If we partition on temporal atts: • Processor allocation changes with time epochs • May help avoid bad hash fcns • Might lead to incorrect results if using panes • Tuples correlated in time tend to be correlated on temporal attribute – bad for load balancing • Exclude temporal attr from partitioning Data Streams: Lecture 15
What partitionings work for aggregation queries? • Group-bys on scalar expressions of source input attr • Ignore grouping on aggregations in lower-level queries • Any subset of a compatible partitioning is also compatible SELECT expr1, expr2, .., exprn FROM STREAM_NAME WHERE tup_predicate GROUP BY temp_var, gb_var1, ..., gb_varm HAVING group_predicate Data Streams: Lecture 15
What partitionings work for join queries? • Equality predicates on scalar expressions of source stream attrs • Any non-empty subset of a compatible partitioning is also compatible • Need to reconcile partitioning of S and R SELECT expr1, expr2, .., exprn FROM STREAM1 AS S{LEFT|RIGHT|FULL}[OUTER] JOIN STREAM2 as R WHERE STREAM1.ts = STREAM2.ts and STREAM1.var11 = STREAM2.var21 and STREAM1.var1k = STEAM2.var2k and other_predicates Data Streams: Lecture 15
Now, multiple queries… tcp_flows: SELECttb, srcIP, destIP, srcPort, destPort, COUNT(*), sum(len) FROM TCP GROUP BY time/60 as tb, srcIP, destIP, srcPort, destPort {sc_exp(srcIP), sc_exp(destIP), sc_exp(srcPort), sc_exp(destPort)} flow_cnt: SELECttb, srcIP, destIP, count(*) FROM tcp_flows GROUP BY tb, srcIP, destIP {sc_exp(srcIP), sc_exp(destIP)} {sc_exp(srcIP), sc_exp(destIP)} Result: Data Streams: Lecture 15
Now, multiple queries… tcp_flows: SELECttb, srcIP, destIP, srcPort, destPort, COUNT(*), sum(len) FROM TCP GROUP BY time/60 as tb, srcIP, destIP, srcPort, destPort {sc_exp(srcIP), sc_exp(destIP), sc_exp(srcPort), sc_exp(destPort)} • Fully compatible partitioning set likely to be empty • Partition to minimize cost of execution flow_cnt: SELECttb, srcIP, destIP, count(*) FROM tcp_flows GROUP BY tb, srcIP, destIP {sc_exp(srcIP), sc_exp(destIP)} Data Streams: Lecture 15
Query Plan Transformation Main idea: push aggregation operator below merge to allow aggregations to execute independently on partitions Main idea: partial aggregates (think panes) Data Streams: Lecture 15 Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008
Performance Data Streams: Lecture 15 Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008