1 / 36

Discretized Streams

Discretized Streams. Fault-Tolerant Streaming Computation at Scale. Matei Zaharia, Tathagata Das (TD) , Haoyuan ( HY) Li, Timothy Hunter, Scott Shenker , Ion Stoica. Motivation. Many big-data applications need to process large data streams in near-real time.

lorin
Download Presentation

Discretized Streams

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discretized Streams Fault-Tolerant Streaming Computation at Scale Matei Zaharia, Tathagata Das (TD), Haoyuan(HY) Li, Timothy Hunter, Scott Shenker, Ion Stoica

  2. Motivation Many big-data applications need to process large data streams in near-real time Require tens to hundreds of nodes Require second-scale latencies Website monitoring Ad monetization Fraud detection

  3. Challenge • Stream processing systems must recoverfrom failures and stragglersquicklyand efficiently • More important for streaming systems than batch systems • Traditional streaming systems don’t achieve these properties simultaneously

  4. Outline • Limitations of Traditional Streaming Systems • Discretized Stream Processing • Unification with Batch and Interactive Processing

  5. Traditional Streaming Systems • Continuous operator model • Each node runs an operator with in-memory mutable state • For each input record, state is updated and new records are sent out mutable state input records node 1 • Mutable state is lost if node fails • Various techniques exist to make state fault-tolerant input records node 3 node 2

  6. Fault-tolerance in Traditional Systems Node Replication [e.g. Borealis, Flux ] • Separate set of “hot failover” nodes process the same data streams • Synchronization protocols ensures exact ordering of records in both sets • On failure, the system switches over to the failover nodes input input sync protocol hot failover nodes Fast recovery, but 2x hardware cost

  7. Fault-tolerance in Traditional Systems Upstream Backup [e.g. TimeStream, Storm ] • Each node maintains backup of the forwarded records since last checkpoint • A “cold failover” node is maintained • On failure, upstream nodes replay the backup records serially to the failover node to recreate the state input input replay backup cold failover node Only need 1 standby, but slow recovery

  8. Slow Nodes in Traditional Systems Node Replication Upstream Backup input input input input Neither approach handles stragglers

  9. Our Goal • Scales to hundreds of nodes • Achieves second-scale latency • Tolerate node failures and stragglers • Sub-second fault and straggler recovery • Minimal overhead beyond base processing

  10. Our Goal • Scales to hundreds of nodes • Achieves second-scale latency • Tolerate node failures and stragglers • Sub-second fault and straggler recovery • Minimal overhead beyond base processing

  11. Why is it hard? Stateful continuous operators tightly integrate “computation” with “mutable state” Makes it harder to define clear boundaries when computation and state can be moved around mutable state stateful continuous operator input records output records

  12. Dissociate computation from state Make state immutableand break computation into small, deterministic,stateless tasks Defines clear boundaries where state and computation can be moved around independently state 2 state 2 state 1 stateless task stateless task stateless task input 1 input 2 input 3

  13. Batch Processing Systems!

  14. Batch Processing Systems Batch processing systems like MapReducedivide • Data into small partitions • Jobs into small, deterministic, stateless map / reduce tasks M M M R R M M M R R M M M R R M M immutable input dataset immutable output dataset M immutable map outputs stateless map tasks stateless reduce tasks

  15. Parallel Recovery Failed tasks are re-executed on the other nodes in parallel M M M R R R M M M M R R R M M M M R R M M immutable input dataset immutable output dataset M M stateless map tasks stateless reduce tasks

  16. Discretized Stream Processing

  17. Discretized Stream Processing Run a streaming computation as a series of small, deterministic batch jobs Store intermediate state data in cluster memory Try to make batch sizes as small as possible to get second-scale latencies

  18. Discretized Stream Processing batch operations time = 0 - 1: input Input: replicated dataset stored in memory Output or State: non-replicated dataset stored in memory time = 1 - 2: input … … output / state stream input stream

  19. Example: Counting page views Discretized Stream (DStream) is a sequence of immutable, partitioned datasets • Can be created from live data streams or by applying bulk, parallel transformationson other DStreams views ones counts creating a DStream views = readStream("http:...", "1 sec") ones = views.map(ev => (ev.url, 1)) counts = ones.runningReduce((x,y) => x+y) t: 0 - 1 reduce map t: 1 - 2 transformation

  20. Fine-grained Lineage • Datasets track fine-grained operation lineage • Datasets are periodically checkpointed asynchronously to prevent long lineages views ones counts t: 0 - 1 reduce map t: 1 - 2 t: 2 - 3

  21. Parallel Fault Recovery • Lineage is used to recompute partitions lost due to failures • Datasets on different time steps recomputed in parallel • Partitions within a dataset also recomputed in parallel views ones counts t: 0 - 1 reduce map t: 1 - 2 t: 2 - 3

  22. Comparison to Upstream Backup Discretized Stream Processing Upstream Backup views ones counts t: 0 - 1 parallelism within a batch Faster recovery than upstream backup,without the 2x cost of nodereplication t: 1 - 2 parallelism across time intervals t: 2 - 3 state stream replayed serially

  23. How much faster than Upstream Backup? Recover time = time taken to recompute and catch up • Depends on available resources in the cluster • Lower system load before failure allows faster recovery Parallel recovery with 10 nodes faster than 5 nodes Parallel recovery with 5 nodes faster than upstream backup

  24. Parallel Straggler Recovery • Straggler mitigation techniques • Detect slow tasks (e.g. 2X slower than other tasks) • Speculatively launch more copies of the tasks in parallel on other machines • Masks the impact of slow nodes on the progress of the system

  25. Evaluation

  26. Spark Streaming • Implemented using Spark processing engine* • Spark allows datasets to be stored in memory, and automatically recovers them using lineage • Modifications required to reduce jobs launching overheads from seconds to milliseconds [ *Resilient Distributed Datasets - NSDI, 2012 ]

  27. How fast is Spark Streaming? Can process 60M records/second on 100 nodes at 1 secondlatency Tested with 100 4-core EC2 instances and 100 streams of text • Count the sentences having a keyword • WordCount over 30 sec sliding window

  28. How does it compare to others? Throughput comparable to other commercial stream processing systems [ Refer to the paper for citations ]

  29. How fast can it recover from faults? Recovery time improves with more frequent checkpointing and more nodes Failure Word Count over 30 sec window

  30. How fast can it recover from stragglers? Speculative execution of slow tasks mask the effect of stragglers

  31. Unification with Batch and Interactive Processing

  32. Unification with Batch and Interactive Processing • Discretized Streams creates a single programming and execution model for running streaming, batch and interactive jobs • Combine live data streams with historic data liveCounts.join(historicCounts).map(...) • Interactively query live streams liveCounts.slice(“21:00”, “21:05”).count()

  33. App combining live + historic data Mobile Millennium Project: Real-time estimation of traffic transit times using live and past GPS observations • Markov chain Monte Carlo simulations on GPS observations • Very CPU intensive • Scales linearly with cluster size

  34. Recent Related Work • Naiad – Full cluster rollback on recovery • SEEP – Extends continuous operators to enable parallel recovery, but does not handle stragglers • TimeStream– Recovery similar to upstream backup • MillWheel – State stored in BigTable, transactions per state update can be expensive

  35. Takeaways • Large scale streaming systems must handle faults and stragglers • Discretized Streamsmodel streaming computation as series of batch jobs • Uses simple techniques to exploit parallelism in streams • Scales to 100 nodes with 1 second latency • Recovers from failures and stragglers very fast • Spark Streaming is open source - spark-project.org • Used in production by ~ 10 organizations!

More Related