1 / 30

Comprehensive Overview of Mizan for Large-Scale Graph Processing Systems

This article presents a detailed exploration of the Mizan system, focusing on its concepts, challenges, implementation, and evaluation in comparison to other graph processing frameworks like Pregel and Giraph. Mizan aims for adaptive and agnostic graph processing, offering dynamic load balancing and efficient migration planning for various graph algorithms. The discussion delves into worker communication, computation load balance, migration strategies, and the potential trade-offs in Mizan's design.

antoiner
Download Presentation

Comprehensive Overview of Mizan for Large-Scale Graph Processing Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mizan:Graph Processing System xuejilong@gmail.com

  2. topics • Brief introduction on Pregel • EuroSys'13 paper: Mizan • Our ideas

  3. Pregel • A System for Large-Scale Graph Processing • Mapreduce v.s. Pregel • Hadoop v.s. Hama,Giraph...

  4. Inside Pregel Graph data Worker1 Graph partitioning output Worker2 Pregel … Workern • Communication cost • Load balance

  5. Inside Pregel

  6. Problems of Pregel Worker1 Worker1 Worker2 Worker2 …… Workern Workern Computation Communication

  7. Mizan • A system for Dynamic Load Balancing in Large-scale Graph Processing

  8. Load balance • Current method for load balance: • Hash partition, Giraph, Pregel • Range partition, • Min-cut partition(Metis) • Sophisticated partitioning tech.

  9. Mizan goal • building a system that is: • Adaptive • Agnostic to the graph structure • requires no a priori knowledge of algorithm behavior

  10. Graph algorithms • Stationary Graph Algorithms: • matrix-vector multiplication • PageRank • finding weakly connected components • Non-stationary Graph Algorithms: • DMST: distributed minimal spanning tree • graph queries • advertisement propagation

  11. Mizan • Monitoring • Migration planning

  12. Monitoring • Outgoing message • Incoming message • Response time

  13. Migration Planning • Five-step migration planning: • Identify the source of imbalance • Select the migration objective • Pair over-utilized workers with under-utilized ones • Select vertices to migrate • Migrate vertices

  14. Step 1

  15. Step 2 • Select the migration objective • Outgoing msg, incoming msg, response time • Compute correlation between: • Outgoing msg and response time • Incoming msg and response time • Default response time

  16. Step 3

  17. Step 4

  18. Step 5 • Migrate vertices • when all workers arriving at migration barrier • Migrated data: • vertex ID • State • edge information (friends list) • the received messages it will process

  19. Mizan

  20. Mizan implementation

  21. Mizan implementation • Challenge 1: Vertex Ownership • huge data • Frequent vertex migration • Techs: • Each vertex has a home_worker • HDT maintain home_worker location

  22. Mizan implementation • Challenge 2: Large Message Size

  23. Evaluation • Experiments: • Implemented Mizan using C++ and MPI • 12 machines with i5 processor 16GB RAM

  24. Evaluation • Benchmarks: • Static: disables any dynamic migration • Work Stealing (WS): Pregel version • Mizan.

  25. Evaluation • Static Mizan vs. Giraph:

  26. Evaluation • PageRank on three system:

  27. Evaluation • Migration costs:

  28. Evaluation • Un-stationary algorithm:

  29. Evaluation • Migration overhead:

  30. Really? • Some arguable parts of Mizan: • Cost: migration planning, multi global information synchronization(S1,S2,S3) • Especially in S3, global order maintaining • Large data transferred in migration • Migration will lead more cross-communication • Centralized management bad than decentralized? • Not friendly to graph mutation ……

More Related