310 likes | 332 Views
This article presents a detailed exploration of the Mizan system, focusing on its concepts, challenges, implementation, and evaluation in comparison to other graph processing frameworks like Pregel and Giraph. Mizan aims for adaptive and agnostic graph processing, offering dynamic load balancing and efficient migration planning for various graph algorithms. The discussion delves into worker communication, computation load balance, migration strategies, and the potential trade-offs in Mizan's design.
E N D
Mizan:Graph Processing System xuejilong@gmail.com
topics • Brief introduction on Pregel • EuroSys'13 paper: Mizan • Our ideas
Pregel • A System for Large-Scale Graph Processing • Mapreduce v.s. Pregel • Hadoop v.s. Hama,Giraph...
Inside Pregel Graph data Worker1 Graph partitioning output Worker2 Pregel … Workern • Communication cost • Load balance
Problems of Pregel Worker1 Worker1 Worker2 Worker2 …… Workern Workern Computation Communication
Mizan • A system for Dynamic Load Balancing in Large-scale Graph Processing
Load balance • Current method for load balance: • Hash partition, Giraph, Pregel • Range partition, • Min-cut partition(Metis) • Sophisticated partitioning tech.
Mizan goal • building a system that is: • Adaptive • Agnostic to the graph structure • requires no a priori knowledge of algorithm behavior
Graph algorithms • Stationary Graph Algorithms: • matrix-vector multiplication • PageRank • finding weakly connected components • Non-stationary Graph Algorithms: • DMST: distributed minimal spanning tree • graph queries • advertisement propagation
Mizan • Monitoring • Migration planning
Monitoring • Outgoing message • Incoming message • Response time
Migration Planning • Five-step migration planning: • Identify the source of imbalance • Select the migration objective • Pair over-utilized workers with under-utilized ones • Select vertices to migrate • Migrate vertices
Step 2 • Select the migration objective • Outgoing msg, incoming msg, response time • Compute correlation between: • Outgoing msg and response time • Incoming msg and response time • Default response time
Step 5 • Migrate vertices • when all workers arriving at migration barrier • Migrated data: • vertex ID • State • edge information (friends list) • the received messages it will process
Mizan implementation • Challenge 1: Vertex Ownership • huge data • Frequent vertex migration • Techs: • Each vertex has a home_worker • HDT maintain home_worker location
Mizan implementation • Challenge 2: Large Message Size
Evaluation • Experiments: • Implemented Mizan using C++ and MPI • 12 machines with i5 processor 16GB RAM
Evaluation • Benchmarks: • Static: disables any dynamic migration • Work Stealing (WS): Pregel version • Mizan.
Evaluation • Static Mizan vs. Giraph:
Evaluation • PageRank on three system:
Evaluation • Migration costs:
Evaluation • Un-stationary algorithm:
Evaluation • Migration overhead:
Really? • Some arguable parts of Mizan: • Cost: migration planning, multi global information synchronization(S1,S2,S3) • Especially in S3, global order maintaining • Large data transferred in migration • Migration will lead more cross-communication • Centralized management bad than decentralized? • Not friendly to graph mutation ……