140 likes | 342 Views
BFS Optimization in MIC. Jul. 9 2014 Heng LIN PACMAN Group Tsinghua University. Background. Using the framework of graph500 http:// www.graph500.org. Optimization. Graph layout optimize. In edge pair generation phase, remap the id by descending degree of vertex.
E N D
BFS Optimization in MIC Jul. 9 2014 Heng LIN PACMAN Group Tsinghua University
Background • Using the framework of graph500 • http://www.graph500.org
Optimization • Graph layout optimize. • In edge pair generation phase, remap the id by descending degree of vertex. • In graph construction phase, sort the neighbor by descending degree of each vertex. • Warm up the bfs_tree data structure. • 4.68 -> 11.12 GTEPS
Related Work • Graph500 June 2014 list release. • K computer ranks 1st.
Related Work [1] Traversing Trillions of Edges in Real-time : Graph Exploration on Large-scale Parallel Machines. IPDPS’14 (1.5TTEPS work) [2] Fast and Energy-efficient Breadth-First Search on a Single NUMA System. ISC’14 [3] NUMA-optimized parallel breadth-first search on multicore single-node system. Big Data’13 [4] Parallel distributed breadth first search on GPU. HiPC’13 [5] Highly scalable graph search for the Graph500 benchmark. HPDC’12
Related Work (Checconiet al, IPDPS’14) • Data decomposition-> 1D • Vertex partition among nodes • Neighbor partition by threads in node.
Related Work (Checconiet al, IPDPS’14) • Data structures -> CSR based. • Coarse index for vertex. • Shortcut in edge list.
Related Work (Checconiet al, IPDPS’14) • Search Pruning -> Direction optimization. • Topdown + Bottomup • Load balance -> Split huge vertex.
Related Work (Checconiet al, IPDPS’14) • Algorithm overview
Related Work (Checconiet al, IPDPS’14) • Communication. • Each thread have buffer for every other. • A header contain source thread, buffer size. • 24 bits local part of destination vertex + 24 bits local part of source vertex • Differential encoding scheme. ( 24 + 8 bits)