A Parallelization of State-of-the-Art Graph Bisection Algorithms

A Parallelization of State-of-the-Art Graph Bisection Algorithms Nan Dun, Kenjiro Taura, Akinori Yonezawa Graduate School of Information Science and Technology The University of Tokyo

Problem Description • Graph Partition • Goal: To minimize cut • K-partition • Bisection (Bipartition) • Problem Complexity • To find best partition or To find approximate partitions: NP-Hard1)2) • Solutions • Heuristics • Non-deterministic • On the Grid グラフ分割問題 L={1,2,3} R={4,5,6} 1 1 4 5 2 2 3 6 無向グラフ G=(V,E)が与えられたとき、|L|=|R|を満たすVの分割(L,R)で、LとR間の枝の本数を最小にするものを求める問題。 SWoPP 2006

Practical Application • In Mathematics • Analysis of sparse system of linear equations • In Computer Science • Modeling data placement on distributed memory, to minimize communication • In other Various Domains • VLSI Design • Transportation Networks • Communication Networks SWoPP 2006

Bisection Refinement Bisection Flow • Bisection Initialization • Random Initialization • Half-Half Initialization • Region Growing • Bisection Refinement • Kernighan-Lin3)4) • Tabu Search7) • Fixed Tabu Search • Reactive Tabu Search Bisection Initialization Initial Bisection Final Bisection SWoPP 2006

Min-Max Greedy Growing7) addset A B A Max: Breaking ties by maximizing internal connections Min: Search vertices which cause minimal edge-cut C SWoPP 2006

Kernighan-Lin3)4) A C • Calculate gain of each vertex • Search a serials of pairs which leads to maximal edge-cut reduction if being swapped • Swap pairs of vertices obtained in 2, lock them from further swap in current pass • Iterate step 1, 2, 3 until edge-cut stops to converge B D Swapping Pair of Vertices A B C D gain(B) = -1, gain(C) = -2 ΔCut of swapping B, C = gain(B) + gain(C) + 2 = -1 *gain := # of Internal Edges - # of External Edges SWoPP 2006

Tabu Search7) • Kernighan-Lin Like • Swapping pairs of vertices according to their gains • Temporarily Forbidden • Previously swapped vertices are temporarily forbad to move for a period of time (Tabu Length) • Tabu Length: A fraction (Tabu Fraction) of |V| • E.g.: Tabu Fraction = 0.01, |V| = 1000, Tabu Length = 0.01 x |V| = 10 Previously swapped pairs are allowed to move again after 10 other swaps • To exceed “Local-Minimum” SWoPP 2006

Graph Types – Tabu Lengths |V| = 35000 |E| = 346572 Deg:Max 43 Min 3 Avg. 19.8 |V| = 17758 |E| = 54196 Deg:Max 573 Min 1 Avg. 6.1 Edge-Cut Tabu Fraction • Number of Vertex Degree • Denser random graphs tend to prefer smaller Tabu lengths, while denser geometric graphs tend to prefer larger tabu lengths8) • Distribution of Vertex Degree • Graphs having uniform distribution of vertex degree tend to have unique fitting tabu length SWoPP 2006

RRTS7) • Synthesis of Heuristics • Heuristics perform as complementary for each other • Reactive • Try each Tabu-length to see which is better • Adaptive to various graphs • Best Quality • Beyond “Local-minimum” • Long Running Time • Scoring Phase REACTIVERANDOMIZEDTABUSEARCH Scoring each Tabu length by small runs of TS do I times Initial bisection by Min-Max do J timesTS with high-scored Tabu length Refine by Kernighan-Lin runs R. Battiti and A. A. Bertossi. Greedy, Prohibition, and Reactive Heuristics for Graph Partitioning. IEEE Transactions on Computers, Vol. 48, April 1999. SWoPP 2006

Multi-level for Large Graphs • Coarsen Phase • Coarsen large graphs to smaller one by using “Match Scheme” • Multi-level coarsen • Bisection Phase • Bisecting small graphs is usually very fast • Uncoarsen Phase • Mapping back to original graph • Perform refinement in each uncoarsening phase • METIS5)12) Matching Scheme SWoPP 2006

Comparison of Heuristics SWoPP 2006

Comparison of Heuristics • METIS • Extremely Fast • Using Multi-level Technique • High-Quality Bisections but worse than RRTS • Multi-level lacks “Global-Optimizing” during coarsen phase • RRTS • Very Slow • Scoring Phase is time costing • “Ever-best” Bisections • Adaptive to kinds of graphs • FTS with Known Tabu-Length • Must faster than RRTS • Comparable result to RRTS SWoPP 2006

A Naive Parallelization Dispatch Graphs RRTS100 RRTS100 RRTS100 RRTS100 RRTS100 RRTS100 RRTS100 Synthesize Results • Run RRTS independently on each node • Simply equivalent to scale-up iterations • Generate Different seeds for different nodes • Heuristics are initial sensitive • 10% ~ 20% enhanced SWoPP 2006

Statistical Properties of Cut-size • Incidence of Bests • Average quality is good • Only 0.25% is the best • General Property • Distribution becomes “Peak” as |V| grows • Distribution tends towards Gaussian8) • Mean and Variance scales linearly with |V| Count Edge-Cut |V| = 35000 |E| = 346572 Degree: Max 43 Min 3 Avg 19.80 RRTS100 on 400 nodes provided by Grid Challenge Federation SWoPP 2006

Issues of Parallelizing Heuristics • Hard by Message-Passing Model (MPI) • J.R. Gilbert and E. Zmijewski9): A parallel graph partitioning algorithm for a message-passing multiprocessor. International Journal of Parallel Programming • Par-METIS (Parallel METIS) • Par-METIS only parallelized “coarsen-uncoarsen” part • Hard to Be Efficient (statistic property) • If we could parallelize heuristic efficiently • The fraction of reach the best bisections is still small among overall iterations • If we corporately run independent instance on Grid • How many nodes will leads to best partition • When will a good threshold come SWoPP 2006

Contribution of Phases • Initial Phase • Reduce large portion of Edge-cut • Good initial partitions lead to good final partitions • Consistent time for different running, good initial partitions gain time for refinement • TS and KL Phase • Reductions tend be alike • More iterations, better results ΔEdge-Cut Best Edge-Cuts SWoPP 2006

Results from Same Initial Bisections • Given Same Initial Partitions • Best initial partitions leads to best final partitions • FTS and KL tend to be deterministic • Fewer swapping are available • Diversity of edge-cut can be cancelled by distributing only one phase • Run FTS and KL on one node is enough Count Perform FTS and KL on same initial partitions, 50 nodes SWoPP 2006

Multi-level Scoring Edge-Cut Edge-Cut Level-1 Tabu Fraction Level-2 Tabu Fraction • Mainly Used to Adapt Large-Scale Graphs • If |V| = 1000, Tabu = 0.01 x 1000 = 10If |V| = 100000, Tabu = 0.01 x 100000 = 1000 • Tuning Tabu-Length to fit specific graphs better • Level-1 Scoring distinguish graphs from their types • Level-2 Scoring test better Tabu-length from specific graphs SWoPP 2006

Final Approaches • Not to Use Multi-level Partition • To preserve a “best” quality • Not to Parallelize Heuristics Itself • Not a good trade-off • To Parallelize Scoring Phase • One group of nodes score one tabu length • With multi-level scoring technique • To Parallelize Initial Phase Only • Remove diversity of edge-cut ASAP • Take advantage of running distribution to remove diversity of edge-cut • Reduce computing effort AMAP • Further refinement can be done on single node • To Use GXP Cluster Shell • “mw” command: mw M {{ W }} SWoPP 2006

Full Picture S: 0.01 S: 0.02 S: 0.03 S: 0.04 S: 0.05 S: 0.06 S: 0.07 Multi-Level Scoring High-Scored Level-1 Tabu Fraction S:0.001 S: 0.002 S: 0.003 S: 0.004 S: 0.005 S: 0.006 S: 0.007 High-Scored Level-2 Tabu Fraction Initial Phase Init Init Init Init Init Init Best Initial Partitions Refinement Phase FTS and KL SWoPP 2006

Conclusions • Bisection Quality • “Ever-Best” partitions • Edge-CutOUR ≤ Edge-CutRRTS≤ Edge-CutMETIS • Bisection Time • Comparable and Reasonable • TimeMETIS< TimeOUR << TimeRRTS • Speed Up 10 comparing to RRTS • Adapted to Grid Environment • Scalable Performance • Convenient usage • Good Fault Tolerant SWoPP 2006

御静聴ありがとうございました！ SWoPP 2006

A Parallelization of State-of-the-Art Graph Bisection Algorithms

A Parallelization of State-of-the-Art Graph Bisection Algorithms

Presentation Transcript

State of art

State of the Art

Automatic Parallelization of Divide and Conquer Algorithms

The state of the art

A state-of-the-art facility

The State of the Art

State of the Art

A State-of-the-Art Warehouse

Graph Algorithms

A Brief Review of the State-of-the-Art

Graph Algorithms

Graph Algorithms

The State of the Art

STATE OF THE ART

The State of the Art

Graph Algorithms

State of the Art

Graph Algorithms

The State of the Art

Running Large Graph Algorithms – Evaluation of Current State-of-the-Art

Graph Algorithms

State of the Art