1 / 45

Dryad and dataflow systems

Dryad and dataflow systems. Michael Isard misard@microsoft.com Microsoft Research 4 th June, 2014. Talk outline. Why is dataflow so useful? What is Dryad? An engineering sweet spot Beyond Dryad Conclusions. Computation on large datasets. Performance mostly efficient resource use

sahkyo
Download Presentation

Dryad and dataflow systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dryad anddataflow systems Michael Isard misard@microsoft.com Microsoft Research 4thJune, 2014

  2. Talk outline • Why is dataflow so useful? • What is Dryad? • An engineering sweet spot • Beyond Dryad • Conclusions

  3. Computation on large datasets • Performance mostly efficient resource use • Locality • Data placed correctly in memory hierarchy • Scheduling • Get enough work done before being interrupted • Decompose into independent batches • Parallel computation • Control communication and synchronization • Distributed computation • Writes must be explicitly shared

  4. Computational model • Vertices are independent • State and scheduling • Dataflow very powerful • Explicit batching and communication Outputs Processing vertices Channels Inputs

  5. Why dataflow now? • Collection-oriented programming model • Operations on collections of objects • Turn spurious (unordered) for into foreach • Not every for is foreach • Aggregation (sum, count, max, etc.) • Grouping • Join, Zip • Iteration • LINQ since ca 2008, now Spark via Scala, Java

  6. Given some lines of text, find the most commonly occurring words. Read the lines from a file Split each line into its constituent words Count how many times each word appears Find the words with the highest counts Well-chosen syntactic sugar blue red red,2 intSortKey(KeyValuePair<string,int> x) { return x.count; } intSortKey(void* x) { return (KeyValuePair<string,int>*)x->count; } blue blue Collection<KeyValuePair<string,int>> blue blue,4 yellow Type inference yellow yellow,3 yellow red • var lines = FS.ReadAsLines(inputFileName); • var words = lines.SelectMany(x => x.Split(‘ ‘)); • var counts = words.CountInGroups(); • var highest = • counts.OrderByDescending(x => x.count).Take(10); Lambda expressions FooCollectionFooTake(FooCollection c, int count) { … } Collection<T> Take(this Collection<T> c, int count) { … } Generics and extension methods

  7. Collections compile to dataflow • Each operator specifies a single data-parallel step • Communication between steps explicit • Collections reference collections, not individual objects! • Communication under control of the system • Partition, pipeline, exchange automatically • LINQ innovation: embedded user-defined functions varwords = lines.SelectMany(x => x.Split(‘ ‘)); • Very expressive • Programmer ‘naturally’ writes pure functions

  8. Distributed sorting set varsorted = set.OrderBy(x => x.key) sample compute histogram range partition by key sort locally sorted

  9. Quiet revolution in parallelism • Programming model is more attractive • Simpler, more concise, readable, maintainable • Program is easier to optimize • Programmer separates computation and communication • System can re-order, distribute, batch, etc. etc.

  10. Talk outline • Why is dataflow so useful? • What is Dryad? • An engineering sweet spot • Beyond Dryad • Conclusions

  11. What is Dryad? • General-purpose DAG execution engine ca 2005 • Cited as inspiration for e.g. Hyracks, Tez • Engine behind Microsoft Cosmos/SCOPE • Initially MSN Search/Bing, now used throughout MSFT • Core of research batch cluster environment ca 2009 • DryadLINQ • Quincy scheduler • TidyFS

  12. What Dryad does • Abstracts cluster resources • Set of computers, network topology, etc. • Recovers from transient failures • Rerun computations on machine or network fault • Speculate duplicates for slow computations • Schedules a local DAG of work at each vertex

  13. Scheduling and fault tolerance • DAG makes things easy • Schedule from source to sink in any order • Re-execute subgraph on failure • Execute “duplicates” for slow vertices

  14. Scheduling and fault tolerance • DAG makes things easy • Schedule from source to sink in any order • Re-execute subgraph on failure • Execute “duplicates” for slow vertices

  15. Scheduling and fault tolerance • DAG makes things easy • Schedule from source to sink in any order • Re-execute subgraph on failure • Execute “duplicates” for slow vertices

  16. Scheduling and fault tolerance • DAG makes things easy • Schedule from source to sink in any order • Re-execute subgraph on failure • Execute “duplicates” for slow vertices

  17. Resources are virtualized • Each graph vertex is a process • Writes outputs to disk (usually) • Reads inputs from upstream nodes’ output files • Graph generally larger than cluster RAM • 1TB partitioned input, 250MB part size, 4000 parts • Cluster is shared • Don’t size program for exact cluster • Use whatever share of resources are available

  18. Integrated system • Collection-oriented programming model (LINQ) • Partitioned file system (TidyFS) • Manages replication and distribution of large data • Cluster scheduler (Quincy) • Jointly schedule multiple jobs at a time • Fine-grain multiplexing between jobs • Balance locality and fairness • Monitoring and debugging (Artemis) • Within job and across jobs

  19. Dryad Cluster Scheduling Scheduler R

  20. Dryad Cluster Scheduling Scheduler R R

  21. Quincy without preemption

  22. Quincy with preemption

  23. Dryad features • Well-tested at scales up to 15k cluster computers • In heavy production use for 8 years • Dataflow graph is mutable at runtime • Repartition to avoid skew • Specialize matrices dense/sparse • Harden fault-tolerance

  24. Talk outline • Why is dataflow so useful? • What is Dryad? • An engineering sweet spot • Beyond Dryad • Conclusions

  25. Stateless DAG dataflow • MapReduce, Dryad, Spark, … • Stateless vertex constraint hampers performance • Iteration and streaming overheads • Why does this design keep repeating?

  26. Software engineering • Fault tolerance well understood • E.g., Chandy-Lamport, rollback recovery, etc. • Basic mechanism: checkpoint plus log • Stateless DAG: no checkpoint! • Programming model “tricked” user • All communication on typed channels • Only channel data needs to be persisted • Fault tolerance comes without programmer effort • Even with UDFs

  27. Talk outline • Why is dataflow so useful? • What is Dryad? • An engineering sweet spot • Beyond Dryad • Conclusions

  28. What about statefuldataflow? • Naiad • Add state to vertices • Support streaming and iteration • Opportunities • Much lower latency • Can model mutable state with dataflow • Challenges • Scheduling • Coordination • Fault tolerance

  29. Batch processing Stream processing Graph processing Timely dataflow

  30. BatchingStreaming vs. (synchronous) (asynchronous) • No coordination needed • Aggregation is difficult • Requires coordination • Supports aggregation

  31. Batch DAG execution Central coordinator

  32. Streaming DAG execution      

  33. Streaming DAG execution      Inline coordination

  34. Batch iteration   Central coordinator

  35. Streaming iteration   

  36. Messages B.SendBy(edge, message, time)  B C D C.OnRecv(edge, message, time) Messages are delivered asynchronously

  37. Notifications C.SendBy(_, _, time) D.NotifyAt(time)  B C D D.OnRecv(_, _, time) D.OnNotify(time) Notifications support batching No more messages at time or earlier

  38. Coordination in timely dataflow • Local scheduling with global progress tracking • Coordination with a shared counter, not a scheduler • Efficient, scalable implementation

  39. Interactive graph analysis #x 32K tweets/s @y ⋈ max ⋈ In 10 queries/s z? ⋈

  40. 32  8-core 2.1 GHz AMD Opteron 16 GB RAM per server Gigabit Ethernet Query latency Max: 140 ms 99th percentile: 70 ms Median: 5.2 ms

  41. Mutable state • In batch DAG systems collections are immutable • Functional definition in terms of preceding subgraph • Adding streaming or iteration introduces mutability • Collection varies as function of epoch, loop iteration

  42. Key-value store as dataflow var lookup = data.join(query, d => d.key, q => q.key) • Modeled random access with dataflow… • Add/remove key is streaming update to data • Look up key is streaming update to query • High throughput requires batching • But that was true anyway, in general

  43. What can’t dataflow do? • Programming model for mutable state? • Not as intuitive as functional collection manipulation • Policies for placement still primitive • Hash everything and hope • Great research opportunities • Intersection of OS, network, runtime, language

  44. Talk outline • Why is dataflow so useful? • What is Dryad? • An engineering sweet spot • Beyond Dryad • Conclusions

  45. Conclusions • Dataflow is a great structuring principle • We know good programming models • We know how to write high-performance systems • Dataflow is the status quo for batch processing • Mutable state is the current research frontier Apache 2.0 licensed source on GitHub http://research.microsoft.com/en-us/um/siliconvalley/projects/BigDataDev/

More Related