Dryad: Distributed Data-Parallel Programs for Sequential Building Blocks

Dryad: Distributed Data-Parallel Programs for Sequential Building Blocks • Presented by: Theodoros Ioannou

Why Dryad • Efficient way for parallel and distributed applications • Take advantage of the multicore servers • Data parallelism • Motivation: GPUs, Map Reduce and Parallel DBs

What is Dryad • Data flow graph with • Vertices and • Channels • Execute vertices and communicate through channels

What is Dryad(cont’d) • Vertices: Sequential Programs given by the programer • Channels: File, TCP pipe, Shared-memory FIFO

Differences from other systems • Allows developers to define communication between vertices • More difficult - Provides better options • A programer can master it in a few weeks • Not as restrictive as MapReduce • Multiple Input and Output • Scales from multicore computers to clusters (~1800 machines)

System Overview • Everything based on the communication flaw • Every vertex runs on a CPU of the cluster • Channels are the data flows between the vertexes • Logical communication graph • Mapped to physical resources at run-time

System Organization Schema

Operators of Graph Descr. Language

SQL Example “It finds all the objects in the database that have neighboring objects within 30 arc seconds such that at least one of the neighbors has a color similar to the primary object’s color.”

SQL Example(cont’d)

SQL ExampleJob’s Skeleton

Execution • Input - The datafile is a distributed file • The graph is dynamically changed because of the positions of datafile partitions • Output - The result is again a distributed file • The scheduler on the JM keeps history of each vertex • On fail, the job is terminated • Replication of vertexes to void that • Use versioning to get the right result • Only fail if it re-run for more than a threshold

Execution(cont’d) • JM assumes it is the only job running on the cluster • Uses greedy algorithm • Vertex programs are deterministic • Same result whenever you run them • If it fails the JM is notified or get a heartbeat timeout • If using FIFO or pipes, kill all the connected vertexes and re-execute all of them

Execution(cont’d) • Run vertexes on the machines (or cluster) as close as possible to the data they use • Because the JM can not know the amount of intermediate data - need for dynamic solution

Experiments • First: SQL Query to Dryad application (Compare to SQL Server - varies the number of machines used) • Second: Simple MapReduce data-mining operation to Dryad application (10.2 TB date and 1800 machines) • Use horizontal partitioning of data, pipelined parallelism within processes and inter-partition exchange operations to move partial results

Results

Shortcomings -Future work • Programer can manipulate inter-process communication - Deadlocks • Programer should know the physical resources of the system - breaks abstraction • Assumption of one job on the cluster - Only one job running • SQL Experiment - Less capabilities from the SQL Server • MapReduce Experiment - Only to show that their system works “sufficiently well” for handling those cases - No results about it • Use statistics for resource prediction before execution of a known program - “we may be able to...” • Sacrifice simplicity - more relaxed kind of code compared with the MapReduce

The End.Questions?

Dryad: Distributed Data-Parallel Programs for Sequential Building Blocks