310 likes | 330 Views
Mizan: Graph Processing System. xuejilong@gmail.com. topics. Brief introduction on Pregel EuroSys'13 paper : Mizan Our ideas. Pregel. A System for Large-Scale Graph Processing Mapreduce v.s. Pregel Hadoop v.s. Hama,Giraph. Inside Pregel. Graph data. Worker1. Graph partitioning.
E N D
Mizan:Graph Processing System xuejilong@gmail.com
topics • Brief introduction on Pregel • EuroSys'13 paper: Mizan • Our ideas
Pregel • A System for Large-Scale Graph Processing • Mapreduce v.s. Pregel • Hadoop v.s. Hama,Giraph...
Inside Pregel Graph data Worker1 Graph partitioning output Worker2 Pregel … Workern • Communication cost • Load balance
Problems of Pregel Worker1 Worker1 Worker2 Worker2 …… Workern Workern Computation Communication
Mizan • A system for Dynamic Load Balancing in Large-scale Graph Processing
Load balance • Current method for load balance: • Hash partition, Giraph, Pregel • Range partition, • Min-cut partition(Metis) • Sophisticated partitioning tech.
Mizan goal • building a system that is: • Adaptive • Agnostic to the graph structure • requires no a priori knowledge of algorithm behavior
Graph algorithms • Stationary Graph Algorithms: • matrix-vector multiplication • PageRank • finding weakly connected components • Non-stationary Graph Algorithms: • DMST: distributed minimal spanning tree • graph queries • advertisement propagation
Mizan • Monitoring • Migration planning
Monitoring • Outgoing message • Incoming message • Response time
Migration Planning • Five-step migration planning: • Identify the source of imbalance • Select the migration objective • Pair over-utilized workers with under-utilized ones • Select vertices to migrate • Migrate vertices
Step 2 • Select the migration objective • Outgoing msg, incoming msg, response time • Compute correlation between: • Outgoing msg and response time • Incoming msg and response time • Default response time
Step 5 • Migrate vertices • when all workers arriving at migration barrier • Migrated data: • vertex ID • State • edge information (friends list) • the received messages it will process
Mizan implementation • Challenge 1: Vertex Ownership • huge data • Frequent vertex migration • Techs: • Each vertex has a home_worker • HDT maintain home_worker location
Mizan implementation • Challenge 2: Large Message Size
Evaluation • Experiments: • Implemented Mizan using C++ and MPI • 12 machines with i5 processor 16GB RAM
Evaluation • Benchmarks: • Static: disables any dynamic migration • Work Stealing (WS): Pregel version • Mizan.
Evaluation • Static Mizan vs. Giraph:
Evaluation • PageRank on three system:
Evaluation • Migration costs:
Evaluation • Un-stationary algorithm:
Evaluation • Migration overhead:
Really? • Some arguable parts of Mizan: • Cost: migration planning, multi global information synchronization(S1,S2,S3) • Especially in S3, global order maintaining • Large data transferred in migration • Migration will lead more cross-communication • Centralized management bad than decentralized? • Not friendly to graph mutation ……