310 likes | 628 Views
Pregel : A System for Large-Scale Graph Processing. Presented by Dylan Davis Authors: Grzegorz Malewicz , Matthew H. Austern , Aart J.C. Bik, James C. Dehnert , Ilan Horn, Naty Leiser , Grzegorz Czajkowski (GOOGLE, INC.). Overview. What is a graph? Graph Problems
E N D
Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: GrzegorzMalewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert, Ilan Horn, NatyLeiser, GrzegorzCzajkowski (GOOGLE, INC.)
Overview • What is a graph? • Graph Problems • The Purpose of Pregel • Model of Computation • C++ API • Implementation • Applications • Experiments
What is a graph? G = (V, E) Binary Tree
Graph Problems Social Network Connections Network Routing
The Purpose of Pregel • Google was interested in applications that could perform internet-related graph algorithms, such as PageRank, so they designed Pregel to perform these tasks efficiently. • It is a scalable, general-purpose system for implementing graph algorithms in a distributed environment. • Focus on “Thinking Like a Vertex” and parallelism
Model of Computation (Vertex) Vertex ID Edge Value Vertex ID Edge Value Vertex ID Vertex Value
Model of Computation (Superstep) Superstep 1 Superstep 2 Superstep 0 Compute() Compute() Compute() Compute() Compute() Compute() Compute() Compute() Compute() Execution Time
Model of Computation (Vertex Actions) A vertex can: • Modify its values • Receive messages from previous superstep • Send messages • Request topology changes Vertex ID Vertex Value
C++ API (Message Passing) Destination Vertex ID Message Value Message Buffer 2 1 2 57
C++ API (Combiners & Aggregators) Combiner Aggregator
C++ API (Topology Mutations) V Superstep
C++ API (Input and Output) 0 1 2 3 4 0 0 0 1 1 0 1 0 0 0 1 1 2 1 1 0 1 1 3 0 1 1 0 1 4 1 1 1 0 0
Implementation (Program Execution) Flow: • Copy user program – Master copy & worker copies • Master assigns graph partitions • Master takes user input data, assigns to workers – load vertex data • Supersteps (Compute() and send messages) • Save output
Implementation (Fault Tolerance) Recover Checkpoint Worker Save() Worker Recompute() Worker Save() X Worker Worker Recompute() Worker Save()
Implementation (Worker) Worker Worker
Implementation (Master) List of Workers Master Partitions
Applications (Shortest Path) 1 2 5 3
Experiments (Description) • Test the execution times of Pregel running the Single-Source Shortest Path algorithm. • Use a cluster of 300 multicore commodity PCs. • Run Pregel with Binary Tree graphs, and with a more realistic, randomly-distributed graph. • Results do not include initialization, graph generation, and result verification times. • Failure Recovery is not included (reduces overhead)
Conclusion • Pregel is a model suitable for large-scale graph computing with a production-quality, scalable and fault tolerant implementation. • Programs are expressed as a sequence of iterations, in each of which a vertex can receive messages sent in the previous iteration, send messages to other vertices, and modify its own state and that of its outgoing edges. • This implementation is flexible enough to express a broad set of algorithms.