220 likes | 322 Views
A Parallelization of State-of-the-Art Graph Bisection Algorithms. Nan Dun , Kenjiro Taura, Akinori Yonezawa Graduate School of Information Science and Technology The University of Tokyo. Problem Description. Graph Partition Goal: To minimize cut K-partition Bisection (Bipartition)
E N D
A Parallelization of State-of-the-Art Graph Bisection Algorithms Nan Dun, Kenjiro Taura, Akinori Yonezawa Graduate School of Information Science and Technology The University of Tokyo
Problem Description • Graph Partition • Goal: To minimize cut • K-partition • Bisection (Bipartition) • Problem Complexity • To find best partition or To find approximate partitions: NP-Hard1)2) • Solutions • Heuristics • Non-deterministic • On the Grid グラフ分割問題 L={1,2,3} R={4,5,6} 1 1 4 5 2 2 3 6 無向グラフ G=(V,E)が与えられたとき、|L|=|R|を満たすVの分割(L,R)で、LとR間の枝の本数を最小にするものを求める問題。 SWoPP 2006
Practical Application • In Mathematics • Analysis of sparse system of linear equations • In Computer Science • Modeling data placement on distributed memory, to minimize communication • In other Various Domains • VLSI Design • Transportation Networks • Communication Networks SWoPP 2006
Bisection Refinement Bisection Flow • Bisection Initialization • Random Initialization • Half-Half Initialization • Region Growing • Bisection Refinement • Kernighan-Lin3)4) • Tabu Search7) • Fixed Tabu Search • Reactive Tabu Search Bisection Initialization Initial Bisection Final Bisection SWoPP 2006
Min-Max Greedy Growing7) addset A B A Max: Breaking ties by maximizing internal connections Min: Search vertices which cause minimal edge-cut C SWoPP 2006
Kernighan-Lin3)4) A C • Calculate gain of each vertex • Search a serials of pairs which leads to maximal edge-cut reduction if being swapped • Swap pairs of vertices obtained in 2, lock them from further swap in current pass • Iterate step 1, 2, 3 until edge-cut stops to converge B D Swapping Pair of Vertices A B C D gain(B) = -1, gain(C) = -2 ΔCut of swapping B, C = gain(B) + gain(C) + 2 = -1 *gain := # of Internal Edges - # of External Edges SWoPP 2006
Tabu Search7) • Kernighan-Lin Like • Swapping pairs of vertices according to their gains • Temporarily Forbidden • Previously swapped vertices are temporarily forbad to move for a period of time (Tabu Length) • Tabu Length: A fraction (Tabu Fraction) of |V| • E.g.: Tabu Fraction = 0.01, |V| = 1000, Tabu Length = 0.01 x |V| = 10 Previously swapped pairs are allowed to move again after 10 other swaps • To exceed “Local-Minimum” SWoPP 2006
Graph Types – Tabu Lengths |V| = 35000 |E| = 346572 Deg:Max 43 Min 3 Avg. 19.8 |V| = 17758 |E| = 54196 Deg:Max 573 Min 1 Avg. 6.1 Edge-Cut Tabu Fraction • Number of Vertex Degree • Denser random graphs tend to prefer smaller Tabu lengths, while denser geometric graphs tend to prefer larger tabu lengths8) • Distribution of Vertex Degree • Graphs having uniform distribution of vertex degree tend to have unique fitting tabu length SWoPP 2006
RRTS7) • Synthesis of Heuristics • Heuristics perform as complementary for each other • Reactive • Try each Tabu-length to see which is better • Adaptive to various graphs • Best Quality • Beyond “Local-minimum” • Long Running Time • Scoring Phase REACTIVERANDOMIZEDTABUSEARCH Scoring each Tabu length by small runs of TS do I times Initial bisection by Min-Max do J timesTS with high-scored Tabu length Refine by Kernighan-Lin runs R. Battiti and A. A. Bertossi. Greedy, Prohibition, and Reactive Heuristics for Graph Partitioning. IEEE Transactions on Computers, Vol. 48, April 1999. SWoPP 2006
Multi-level for Large Graphs • Coarsen Phase • Coarsen large graphs to smaller one by using “Match Scheme” • Multi-level coarsen • Bisection Phase • Bisecting small graphs is usually very fast • Uncoarsen Phase • Mapping back to original graph • Perform refinement in each uncoarsening phase • METIS5)12) Matching Scheme SWoPP 2006
Comparison of Heuristics SWoPP 2006
Comparison of Heuristics • METIS • Extremely Fast • Using Multi-level Technique • High-Quality Bisections but worse than RRTS • Multi-level lacks “Global-Optimizing” during coarsen phase • RRTS • Very Slow • Scoring Phase is time costing • “Ever-best” Bisections • Adaptive to kinds of graphs • FTS with Known Tabu-Length • Must faster than RRTS • Comparable result to RRTS SWoPP 2006
A Naive Parallelization Dispatch Graphs RRTS100 RRTS100 RRTS100 RRTS100 RRTS100 RRTS100 RRTS100 Synthesize Results • Run RRTS independently on each node • Simply equivalent to scale-up iterations • Generate Different seeds for different nodes • Heuristics are initial sensitive • 10% ~ 20% enhanced SWoPP 2006
Statistical Properties of Cut-size • Incidence of Bests • Average quality is good • Only 0.25% is the best • General Property • Distribution becomes “Peak” as |V| grows • Distribution tends towards Gaussian8) • Mean and Variance scales linearly with |V| Count Edge-Cut |V| = 35000 |E| = 346572 Degree: Max 43 Min 3 Avg 19.80 RRTS100 on 400 nodes provided by Grid Challenge Federation SWoPP 2006
Issues of Parallelizing Heuristics • Hard by Message-Passing Model (MPI) • J.R. Gilbert and E. Zmijewski9): A parallel graph partitioning algorithm for a message-passing multiprocessor. International Journal of Parallel Programming • Par-METIS (Parallel METIS) • Par-METIS only parallelized “coarsen-uncoarsen” part • Hard to Be Efficient (statistic property) • If we could parallelize heuristic efficiently • The fraction of reach the best bisections is still small among overall iterations • If we corporately run independent instance on Grid • How many nodes will leads to best partition • When will a good threshold come SWoPP 2006
Contribution of Phases • Initial Phase • Reduce large portion of Edge-cut • Good initial partitions lead to good final partitions • Consistent time for different running, good initial partitions gain time for refinement • TS and KL Phase • Reductions tend be alike • More iterations, better results ΔEdge-Cut Best Edge-Cuts SWoPP 2006
Results from Same Initial Bisections • Given Same Initial Partitions • Best initial partitions leads to best final partitions • FTS and KL tend to be deterministic • Fewer swapping are available • Diversity of edge-cut can be cancelled by distributing only one phase • Run FTS and KL on one node is enough Count Perform FTS and KL on same initial partitions, 50 nodes SWoPP 2006
Multi-level Scoring Edge-Cut Edge-Cut Level-1 Tabu Fraction Level-2 Tabu Fraction • Mainly Used to Adapt Large-Scale Graphs • If |V| = 1000, Tabu = 0.01 x 1000 = 10If |V| = 100000, Tabu = 0.01 x 100000 = 1000 • Tuning Tabu-Length to fit specific graphs better • Level-1 Scoring distinguish graphs from their types • Level-2 Scoring test better Tabu-length from specific graphs SWoPP 2006
Final Approaches • Not to Use Multi-level Partition • To preserve a “best” quality • Not to Parallelize Heuristics Itself • Not a good trade-off • To Parallelize Scoring Phase • One group of nodes score one tabu length • With multi-level scoring technique • To Parallelize Initial Phase Only • Remove diversity of edge-cut ASAP • Take advantage of running distribution to remove diversity of edge-cut • Reduce computing effort AMAP • Further refinement can be done on single node • To Use GXP Cluster Shell • “mw” command: mw M {{ W }} SWoPP 2006
Full Picture S: 0.01 S: 0.02 S: 0.03 S: 0.04 S: 0.05 S: 0.06 S: 0.07 Multi-Level Scoring High-Scored Level-1 Tabu Fraction S:0.001 S: 0.002 S: 0.003 S: 0.004 S: 0.005 S: 0.006 S: 0.007 High-Scored Level-2 Tabu Fraction Initial Phase Init Init Init Init Init Init Best Initial Partitions Refinement Phase FTS and KL SWoPP 2006
Conclusions • Bisection Quality • “Ever-Best” partitions • Edge-CutOUR ≤ Edge-CutRRTS≤ Edge-CutMETIS • Bisection Time • Comparable and Reasonable • TimeMETIS< TimeOUR << TimeRRTS • Speed Up 10 comparing to RRTS • Adapted to Grid Environment • Scalable Performance • Convenient usage • Good Fault Tolerant SWoPP 2006
御静聴ありがとうございました! SWoPP 2006