1 / 37

Kristin Tufte, David Maier

CS 410/510 Data Streams Lecture 15: How Soccer Players Would do Stream Joins & Query-Aware Partitioning for Monitoring Massive Network Data Streams. Kristin Tufte, David Maier . How Soccer Players Would do Stream Joins. Handshake Join Evaluate window-based stream joins Highly parallelizable

bryson
Download Presentation

Kristin Tufte, David Maier

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 410/510Data StreamsLecture 15: How Soccer Players Would do Stream Joins & Query-Aware Partitioning for Monitoring Massive Network Data Streams Kristin Tufte, David Maier Data Streams: Lecture 15

  2. How Soccer Players Would do Stream Joins • Handshake Join • Evaluate window-based stream joins • Highly parallelizable • Implementation on multi-core machine and FPGA • Previous stream join execution strategies • Sequential execution based on operational semantics Data Streams: Lecture 15

  3. Let’s talk about stream joins • Join window of R with window of S • Focus on sliding windows here • Scan, Insert, Invalidate • How might I parallelize? • Partition and replicate • Time-based windows vs. tuple-based windows Figure Credit: How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011 Data Streams: Lecture 15

  4. So, Handshake Join… • Parallelization needs partitioning; possibly replication • Needs central coordination • Entering tuple pushes oldest tuple out • No central coordination • Same semantics • May introduce disorder Figure Credit : How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011 Handshake Join Stream Join Input B Input A Traditional Stream Join Data Streams: Lecture 15

  5. Parallelization • Each core gets a segment of each window • Data flow: act locally on new data arrival and passing on data • Good for shared-nothingsetups • Simple communication – interact with neighbors; avoid bottlenecks Data Streams: Lecture 15 Figure Credit: How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011

  6. Parallelization - Observations • Parallelizes tuple-based windows and non equi-join predicates • As written, compares all tuples – could hash at each node to optimize • Note data transfer costs between cores and each tuple is processed at each core • Soccer players have short arms, hardware is NUMA Data Streams: Lecture 15 Figure Credit: How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011

  7. Scalability • Data flow + point-to-point communication • Add’l cores: larger window sizes or reduce workload per core • “directly turn any degree of parallelism into higher throughput or larger supported window sizes” • “can trivially be scaled up to handle larger join windows, higher throughput rates, or more compute-intensive join predicates” Data Streams: Lecture 15 Figure Credit: How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011

  8. Encountering Tuples • Item in either window, encounters all current times in the other window • Immediate scan strategy • Flexible segment boundaries (cores) • Other local implementations Figure : How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011 Data Streams: Lecture 15 Figure Credit: How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011

  9. Handshake Join with Message Passing • Lock-step processing (tuple-based windows) • FIFO queues with message passing • Missed join-pair Data Streams: Lecture 15

  10. Two-phase forwarding • Asymmetric synchronization (replication on one core only) • Keep copies of forwarded tuples until ack received • Ack for s4 must be processed between r5 and r6 Data Streams: Lecture 15

  11. Load Balancing & Synchronization • Even distribution not needed for correctness • Maintain mostly even-sized local S windows • Synch at pipeline ends to manage windows Data Streams: Lecture 15

  12. FPGA Implementation • Tuple-based windows that fit into memory • Common clock signal; lock-step processing • Nested-loops join processing Data Streams: Lecture 15

  13. Performance Scalability on Multi-Core CPU Scalability on FPGAs; 8 tuples/window Data Streams: Lecture 15

  14. Before we move on… • Soccer joins focuses on sliding windows • How would their algorithm and implementation work for tumbling windows? • What if we did tumbling windows only? Data Streams: Lecture 15

  15. Query-Aware Partitioning for Monitoring Massive Network Data Streams • OC-786 Networks • 100 million packets/sec • 2x40 Gbit/sec • Query plan partitioning • Issues: “heavy” operators, non-uniform resource consumption • Data stream partitioning Data Streams: Lecture 15

  16. Let’s partition the data… SELECT time, srcIP, destIP, srcPrt, destPort, COUNT(*), SUM(len), MIN(timestamp), MAX(timestamp) ... FROM TCP GROUP BY time, srcIP, destIP, srcPort, destPort • Computes packet summaries between src and dest for network monitoring • Round robin partitioning -> worst case a single flow results in n partial flows Data Streams: Lecture 15

  17. And, we might want a HAVING… SELECT time, srcIP, destIP, srcPrt, destPort, COUNT(*), SUM(len), MIN(timestamp), MAX(timestamp) ... FROM TCP GROUP BY time, srcIP, destIP, srcPort, destPort HAVING OR_AGGR(flags) = ATTACK_PATTERN • Round robin partitioning -> no node can apply HAVING • CPU and network load on final aggregator is high Data Streams: Lecture 15

  18. So, let’s partition better… SELECT time, srcIP, destIP, srcPrt, destPort, COUNT(*), SUM(len), MIN(timestamp), MAX(timestamp) ... FROM TCP GROUP BY time, srcIP, destIP, srcPort, destPort HAVING OR_AGGR(flags) = ATTACK_PATTERN • What about partitioning on : srcIP, destIP, srcPort, destPort (partition flows)? • Yeah! Nodes can compute and apply HAVING locally … • But, what if I have more than one query? Data Streams: Lecture 15

  19. But I need to run lots of queries… • Large number of simultaneous queries are common (i.e. 50) • Subqueries place different requirements on partitioning • Dynamic repartitioning for each query? • That’s what the parallel DBs do… • Splitting 80 Gbit/sec -> specialized network hardware • Partition stream once and only once… Data Streams: Lecture 15

  20. Partitioning Limitations • Program partitioning in FPGAs • TCP fields (src, dest IP) - ok • Fields from HTTP – not ok • Can’t re-partition every time the workload changes Data Streams: Lecture 15

  21. Query-Aware Partitioning • Analysis framework • Determine optimal partitioning • Partition-aware distributed query optimizer • Takes advantage of existing partitions Data Streams: Lecture 15

  22. Query-Aware Partitioning • Analysis framework • Determine optimal partitioning • Partition-aware distributed query optimizer • Takes advantage of existing partitions • Compatible partitioning • Maximizes amount of data reduction done locally • Formal definition of compatible partitioning • Compatible partitioning – aggregations & joins Data Streams: Lecture 15

  23. GS Uses Tumbling Windows (only) SELECT tb, srcIP, destIP, sum(len) FROM PKT GROUP BY time/60 as tb, srcIP, destIP • Time attribute is ordered (increasing) SELECT time, PKT1.srcIp, PKT1.destIP, PKT1.len + PKT2.len FROM PKT1 JOIN PKT2 WHERE PKT1.time = PKT2.time and PKT1.srcIP = PKT2.srcIP and PKT1.destIP = PKT2.destIP Data Streams: Lecture 15

  24. Query Example flows: SELECT tb, srcIP, destIP, COUNT(*) as cnt FROM TCP GROUP BY time/60 as tb, srcIP, destIP heavy_flows: SELECT tb, srcIP, max(cnt) as max_cnt FROM flows GROUP BY tb, srcIP flow_pairs: SELECT S1.tb, S1.srcIP, S1.max_cnt, S2.max_cnt FROM heavy_flows S1, heavy_flows S2 WHERE S1.srcIP = S2.srcIP and S1.tb = S2.tb+1 Data Streams: Lecture 15 Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008

  25. Query Example flows: SELECT tb, srcIP, destIP, COUNT(*) as cnt FROM TCP GROUP BY time/60 as tb, srcIP, destIP • Which partitioning scheme is optimal for each of the queries? heavy_flows: SELECT tb, srcIP, max(cnt) as max_cnt FROM flows GROUP BY tb, srcIP flow_pairs: SELECT S1.tb, S1.srcIP, S1.max_cnt, S2.max_cnt FROM heavy_flows S1, heavy_flows S2 WHERE S1.srcIP = S2.srcIP and S1.tb = S2.tb+1 Data Streams: Lecture 15 Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008

  26. Query Example flows: SELECT tb, srcIP, destIP, COUNT(*) as cnt FROM TCP GROUP BY time/60 as tb, srcIP, destIP • How to reconcile potentially conflicting partitioning requirements? heavy_flows: SELECT tb, srcIP, max(cnt) as max_cnt FROM flows GROUP BY tb, srcIP flow_pairs: SELECT S1.tb, S1.srcIP, S1.max_cnt, S2.max_cnt FROM heavy_flows S1, heavy_flows S2 WHERE S1.srcIP = S2.srcIP and S1.tb = S2.tb+1 Data Streams: Lecture 15 Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008

  27. Query Example flows: SELECT tb, srcIP, destIP, COUNT(*) as cnt FROM TCP GROUP BY time/60 as tb, srcIP, destIP • How can we use information about existing partitioning in a distributed query optimizer? heavy_flows: SELECT tb, srcIP, max(cnt) as max_cnt FROM flows GROUP BY tb, srcIP flow_pairs: SELECT S1.tb, S1.srcIP, S1.max_cnt, S2.max_cnt FROM heavy_flows S1, heavy_flows S2 WHERE S1.srcIP = S2.srcIP and S1.tb = S2.tb+1 Data Streams: Lecture 15 Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008

  28. What if we could only partition on destIP? Data Streams: Lecture 15 Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008

  29. Partition compatibility SELECT tb, srcIP, destIP, sum(len) FROM PKT GROUP BY time/60 as tb, srcIP, destIP • Partitioning on (time/60, srcIP, destIP) -> execute aggregation locally then union • (srcIP, destIP, srcPort, destPort) can’t aggregate locally Data Streams: Lecture 15

  30. Partition compatibility SELECT tb, srcIP, destIP, sum(len) FROM PKT GROUP BY time/60 as tb, srcIP, destIP • Partitioning on (time/60, srcIP, destIP) -> execute aggregation locally then union • (srcIP, destIP, srcPort, destPort) can’t aggregate locally • P is Compatible with Q if for every time window, the output of Q is equal to a stream union of the output of Q running on partitions produced by P Data Streams: Lecture 15

  31. Should we partition on temporal attributes? • If we partition on temporal atts: • Processor allocation changes with time epochs • May help avoid bad hash fcns • Might lead to incorrect results if using panes • Tuples correlated in time tend to be correlated on temporal attribute – bad for load balancing • Exclude temporal attr from partitioning Data Streams: Lecture 15

  32. What partitionings work for aggregation queries? • Group-bys on scalar expressions of source input attr • Ignore grouping on aggregations in lower-level queries • Any subset of a compatible partitioning is also compatible SELECT expr1, expr2, .., exprn FROM STREAM_NAME WHERE tup_predicate GROUP BY temp_var, gb_var1, ..., gb_varm HAVING group_predicate Data Streams: Lecture 15

  33. What partitionings work for join queries? • Equality predicates on scalar expressions of source stream attrs • Any non-empty subset of a compatible partitioning is also compatible • Need to reconcile partitioning of S and R SELECT expr1, expr2, .., exprn FROM STREAM1 AS S{LEFT|RIGHT|FULL}[OUTER] JOIN STREAM2 as R WHERE STREAM1.ts = STREAM2.ts and STREAM1.var11 = STREAM2.var21 and STREAM1.var1k = STEAM2.var2k and other_predicates Data Streams: Lecture 15

  34. Now, multiple queries… tcp_flows: SELECttb, srcIP, destIP, srcPort, destPort, COUNT(*), sum(len) FROM TCP GROUP BY time/60 as tb, srcIP, destIP, srcPort, destPort {sc_exp(srcIP), sc_exp(destIP), sc_exp(srcPort), sc_exp(destPort)} flow_cnt: SELECttb, srcIP, destIP, count(*) FROM tcp_flows GROUP BY tb, srcIP, destIP {sc_exp(srcIP), sc_exp(destIP)} {sc_exp(srcIP), sc_exp(destIP)} Result: Data Streams: Lecture 15

  35. Now, multiple queries… tcp_flows: SELECttb, srcIP, destIP, srcPort, destPort, COUNT(*), sum(len) FROM TCP GROUP BY time/60 as tb, srcIP, destIP, srcPort, destPort {sc_exp(srcIP), sc_exp(destIP), sc_exp(srcPort), sc_exp(destPort)} • Fully compatible partitioning set likely to be empty • Partition to minimize cost of execution flow_cnt: SELECttb, srcIP, destIP, count(*) FROM tcp_flows GROUP BY tb, srcIP, destIP {sc_exp(srcIP), sc_exp(destIP)} Data Streams: Lecture 15

  36. Query Plan Transformation Main idea: push aggregation operator below merge to allow aggregations to execute independently on partitions Main idea: partial aggregates (think panes) Data Streams: Lecture 15 Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008

  37. Performance Data Streams: Lecture 15 Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008

More Related