210 likes | 445 Views
Dryad. Distributed Data-Parallel Programs from Sequential Building Blocks. Michael Isard , Mihai Budiu , Yuan Yu, Andrew Birrell , Dennis Fetterly of Microsoft Research, Silicon Valley Presented by: Thomas Hummel. Agenda. Introduction System Overview Dryad Graph
E N D
Dryad Distributed Data-Parallel Programs from Sequential Building Blocks Michael Isard, MihaiBudiu, Yuan Yu, Andrew Birrell, Dennis Fetterly of Microsoft Research, Silicon Valley Presented by: Thomas Hummel
Agenda • Introduction • System Overview • Dryad Graph • Program Development • Program Execution • Experimental Results • Future Work
Introduction • Problem • How to write efficient distributed programs easily? • Environment • Parallel Processors • High Speed Links • Administered Domain • Ignore Low Level Issues
Introduction • Parallel Execution • Faster Execution • Automatic Specification • Manual Specification • GPU Shader • Distributed Databases • MapReduce
Introduction • Graph Model • Verticies Are Programs • Edges Are Communication Links • Forced Parallelism Mindset • Necessary Abstraction
Introduction • GPU Shader • Low Level • Hardware Specific • MapReduce • Simplicity Paramount • Performance Sacrificed • Database • Implicit Communication • Algebra Optimized
Introduction • Dryad • Fine Communication Control • Multiple Input/Output Sets • Must Consider Resources • Execution Engine • Executes DAG Of Programs • Outputs Directed To Inputs • No Recursion
System Overview • Dryad Job • DAG • Data Passed On Edges • Vertex is a Program • Message Structure • User Defined • Shared Memory • TCP • Files
System Overview • Dryad Job • DAG • Data Passed On Edges • Vertex is a Program • Message Structure • User Defined • Shared Memory • TCP • Files
System Overview • System Organization • Job Manager • Name Server • Dameon (Work Nodes)
Dryad Graph • Graph Description Language • “Embedded” in C++ • Combine Sub-Graphs • C++ Class • Inherited By Vertex Program • Program Name • Program Factory
Dryad Graph • Vertex Creation • C++ Class • Inherited By Vertex Program • Program Name • Program Factory • One Vertex Is a Graph • Factory Called • Program Specific Arguments Applied
Dryad Graph • Edge Creation • Composition (Combine) Operation • Two Graphs • Varying Assignment Methods
Dryad Graph • Communication Channel • File I/O By Default • TCP • Shared Memory • Pitfall: Connected Vertices Must Be On Same Process • Deadlock Avoidance • DAG Architecture
Program Development • Vertex Program Development • C++ Base Classes • Status And Errors Reported to Job Manager • Standard “Main” Method • Channel Readers/Writers • Supplied Via Argument List • Legacy Programs • C++ Wrapper
Program Development • Pipelined Execution • Assuming Sequential Code • Event Based Programming • Channels Are Asynchronous • Thread Pool • Optimized For Verticies
Program Execution • Job Manager • Job Ends If JM Machine Fails • Different Schemes Possible To Avoid This • Versioning System For Execution Instances • Vertex Execution • Starts When All Input Channels Ready • User Can Specify Execution Machine • Can Be Re-Run On Failures • Job Ends After All Verticies Have Run
Program Execution • Fault Tolerance • Re-Run Vertex If Failed • Channel Re-Creation (File Recreation) • TCP/Shared Memory Failures Cause Failures On All Connected Vertices • Staged Execution Allows Intermediate Error Checking
Experimental Results • SQL Operation • 10 Computer Cluster • Gigabit Connections • Data Mining Operation • 1800 Computer Cluster • 10 TB Data Set • 11 Minute Execution Time
Future Work • Scripting Language • Nebula • Additional Abstraction • SISS Integration • SQL Server Integration • Distributed SQL Queries • Query Optimizer