1 / 14

Managing Large Graphs on Multi-Cores With Graph Awareness

Managing Large Graphs on Multi-Cores With Graph Awareness. Vijayan, Ming, Xuetian, Frank, Lidong, Maya Microsoft Research. Motivation. Tremendous increase in graph data and applications New class of graph applications that require real-time responses

ervin
Download Presentation

Managing Large Graphs on Multi-Cores With Graph Awareness

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Managing Large Graphs on Multi-Cores With Graph Awareness Vijayan, Ming, Xuetian, Frank, Lidong, Maya Microsoft Research

  2. Motivation • Tremendous increase in graph data and applications • New class of graph applications that require real-time responses • Even batch-processed workloads have strict time-constraints • Multi-core revolution • Default standards on most machines • Large-scale multi-cores with terabytes of main memory • Run workloads that are traditionally run on distributed systems • Existing graph-processing systems lack support for both

  3. A High-level Description of Grace Outline Overview Details of optimizations Details on transactions Subset of results Grace is an in-memory graph management and processing system Implements several optimizations • Graph-specific • Multi-core-specific Supports snapshots and transactional updates on graphs Evaluation shows that optimizations help Grace run several times faster than other alternatives

  4. An Overview of Grace • Keeps an entire graph in memory in smaller parts. • Exposes C-style API for writing graph workloads, iterative workloads, and updates. • Design driven by two trends • Graph-specific locality • Partitionable and parallelizable workloads B Grace API v = GetVertex(Id) for (i=0; i<v.degree;i++) neigh=v.GetNeighbor(i) Iterative Programs (e.g., PageRank) A C E Graph and Multi-core Optimizations D Net RPC Core 0 Core 1

  5. Data Structures Vertex Log A B C Edge Log C B C B C C Edges of A Edges of B Edges of C C A B 1 0 2 A B D C Edge Pointer Array 0 1 1 1 Vertex Index Vertex Allocation Map Data Structures in a Partition

  6. Graph-Aware Partitioning & Placement • Partitioning and placement – are they useful on a single machine? • Yes, to take advantage of multi-cores and memory hierarchies • Solve them using graph partitioning algorithms • Divide a graph into sub-graphs, minimizing edge-cuts • Grace provides an extensible library • Graph-aware: heuristic-based, spectral partitioning, Metis • Graph-agnostic: hash partitioning • Achieve better layout by recursive graph partitioning • Recursively run graph partition until a sub-graph can fit in a cache line • Recompose all the sub-graphs to get the vertex layout

  7. Platform for Parallel Iterative Computations Iterative computation platform implements “bulk synchronous parallel” model. Parallel computations Iteration 1 Propagate updates Barrier Iteration 2

  8. Load Balancing and Updates Batching Solution1: Load balancing is implemented by sharing a portion of vertices Problem2: Updates in arbitrary order can increase cache misses Problem1: overloaded partitions can affect performance • Solution2: Updates batching is implemented by • grouping updates by their destination part • Issuing updates in a round-robin fashion Barrier Cache line B D A C Part1 Core1 Part2 Core2 Part0 Core0

  9. Transactions on Graphs • Grace supports structural changes to a graph • BeginTransaction() • AddVertex(X) • AddEdge(X, Y) • EndTransaction() • Transactions use snapshot isolation • Instantaneous snapshots using CoW techniques • CoW can affect careful memory layout!

  10. Evaluation • Graphs: • Web (v:88M, e:275M), sparse • Orkut (v:3M, e:223M), dense • Workloads: • N-hop-neighbor queries, BFS, DFS, PageRank, Weakly-Connected Components, Shortest Path • Architecture: • Intel Xeon-12 cores, 2 chips with 6 cores each • AMD Opteron-48 cores, 4 chips with 12 cores each • Questions: • How well partitioning and placement work? • How useful are load balancing and updates batching? • How does Grace compare to other systems?

  11. Partitioning and Placement Performance On Intel PageRank Speedup Orkut graph partitions Web graph partitions 2 1 3 Observation: For smaller number of partitions, partition algorithm didn’t make a big difference Reason: All the partitions fit within cores of single chip minimizing communication cost Observation: Careful vertex arrangement works better when graph partitioning is used for sparse graphs Reason: graph partitioning puts neighbors under same part helping better placement Observation: Placing neighboring vertices close together improves performance significantly Reason: L1, L2, and L3 cache and Data-TLB misses are reduced

  12. Load Balancing and Updates Batching On Intel PageRank Speedup Orkut graph partitions Web graph partitions 2 1 Observation: Load balancing and updates batching didn’t improve performance for web graph Reason: Sparse graphs can be partitioned better and there are fewer updates to send Observation: Batching updates gives better performance improvement for Orkut graph Reason: Updates batching reduces remote cache accesses Retired Load

  13. Comparing Grace, BDB, and Neo4j Running Time (s)

  14. Conclusion Grace explores graph-specific and multi-core specific optimizations What worked and what didn’t (in our setup; your mileage might differ) • Careful vertex placement in memory gave good improvements • Partitioning and updates batching worked in most cases, but not always • Load balancing wasn’t as useful

More Related