300 likes | 505 Views
A Push- Relabel -Based Maximum Cardinality Bipartite Matching Algorithm on GPUs . Mehmet Deveci 1,3 , Kamer Kaya 1 , Bora Uçar 4 , and Ümit V. Çatalyürek 1,2 1 Dept. of Biomedical Informatics, The Ohio State University
E N D
A Push-Relabel-Based Maximum Cardinality Bipartite Matching Algorithm on GPUs Mehmet Deveci1,3, Kamer Kaya1, Bora Uçar4, andÜmit V. Çatalyürek1,2 1Dept. of Biomedical Informatics, The Ohio State University 2Dept. of Electrical & Computer Engineering, The Ohio State University 3Dept. of Computer Science & Engineering, The Ohio State University 4CNRS and LIP, ENS Lyon
Matching • Problem: Given a graph, find a set of vertex disjoint edges with the maximum cardinality. • In this work, we focus on bipartite graphs. • Applications: • Bioinformatics • Scheduling • Image processing • Sparse linear solvers Deveci et al. "A Push-Relabel-Based Max Cardinality Bipartite Matching Algorithm on GPUs"
Matching on Bipartite Graphs • A matching 𝑀is a subset of edges 𝐸’ where a vertex cannot be adjacent to more than 1 edge in 𝐸’. 1 1 2 2 3 3 4 4 5 5 Rows Columns Deveci et al. "A Push-Relabel-Based Max Cardinality Bipartite Matching Algorithm on GPUs"
Matching on Bipartite Graphs MAXIMALMAXIMUM 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 r5– c5 – r4–c3 – r2–c2 Deveci et al. "A Push-Relabel-Based Max Cardinality Bipartite Matching Algorithm on GPUs"
Max. Cardinality Matching Algorithms • Augmenting-path-based algorithms… • look for the augmenting paths • increase the matching cardinality if they find one • (if there is none the matching is maximum) • differ by how they search the augmenting paths (DFS, BFS, or hybrid) • multicore [Azad13] and GPU parallelization [Deveci13] • Push-relabel approach [Cherkassky98] • Pseudo-flow algorithms [Hochbaum98 and Chandran11] Deveci et al. "A Push-Relabel-Based Max Cardinality Bipartite Matching Algorithm on GPUs"
Push-relabel-based Matching • A matching is a flow from rows to columns. • PR-based algorithms push load on an unmatched column vertex to a judiciously chosen row (if the target is already matched it is a double push). • The vertices are labeled. Each label is a lower bound on the shortest distance to an unmatched row vertex. • Initially all rows are labeled with 0. • All columns are labeled with 1. • After each push, the vertices are relabeled. 0 1 1 1 0 2 2 1 2 1 1 1 0 2 2 1 4 1 1 1 0 2 2 3 4 1 1 1 2 2 2 3 Deveci et al. "A Push-Relabel-Based Max Cardinality Bipartite Matching Algorithm on GPUs"
Push-relabel-based Matching • PR-based algorithms • search and augment simultaneously • do not construct augmenting paths • follow them and repeatedly augment a prefix of a hypothetical augmenting path • Global relabeling • Active (unmatched) column process order • FIFO, LIFO, etc. • Search spread Deveci et al. "A Push-Relabel-Based Max Cardinality Bipartite Matching Algorithm on GPUs"
Push-relabel based algorithm How close a vertex to an unmatched row? Heuristic to follow speculative augmenting paths. Change the matching membership of the edges Deveci et al. "A Push-Relabel-Based Max Cardinality Bipartite Matching Algorithm on GPUs"
Parallel Push-Relabel • There are several parallel push-relabelimplementation for maximum flow problems [Vineet08, Hussein07 and He10]. • This is the first study which focuses on the maximum cardinality matching on GPUs. • We develop an atomic- and lock-freeimplementation to unlike the previous GPU based PR implementations. Deveci et al. "A Push-Relabel-Based Max Cardinality Bipartite Matching Algorithm on GPUs"
Push-relabel based GPU algorithm At first labels are exact. But after some iterations they are not. GR fixes them to their exact values. Overhead? Frequency? We allow errors in matching. Two active column vertices can be matched to the same row. Deveci et al. "A Push-Relabel-Based Max Cardinality Bipartite Matching Algorithm on GPUs"
Push-relabel based GPU algorithm The number of active vertices is decreasing (at first very fast then slow). Using an array and dynamically shrinking it reduces the amount of threads. A lock on u could help to avoid conflicts. Without locks using an active column array and avoid inconsistencies are not straightforward. Deveci et al. "A Push-Relabel-Based Max Cardinality Bipartite Matching Algorithm on GPUs"
G-PR 2 5 6 8 1 1 1 0 1 2 2 2 1 2 3 3 2 5 6 8 1 0 4 4 • Uses an active vertex list: • Reduces the total work done by threads • Reduces the imbalances between threads in warps (denser warps). • Works in 2 phases. • Detects inconsistencies & actual push-relabel. 1 2 5 5 1 2 6 6 1 0 7 7 1 2 8 8 1 0 9 9 Columns Rows Deveci et al. "A Push-Relabel-Based Max Cardinality Bipartite Matching Algorithm on GPUs"
G-PR 1 1 1 0 3 1 2 2 2 1 2 3 3 2 5 6 8 1 0 4 4 5 1 2 5 5 5 1 2 6 6 1 0 7 7 8 1 2 8 8 1 0 9 9 Columns Rows Deveci et al. "A Push-Relabel-Based Max Cardinality Bipartite Matching Algorithm on GPUs"
G-PR 1 1 1 0 3 3 2 2 2 1 4 3 3 2 5 6 8 1 0 4 4 5 3 4 5 5 3 4 4 9 5 3 2 6 6 1 0 7 7 8 3 4 8 8 1 0 9 9 Columns Rows Deveci et al. "A Push-Relabel-Based Max Cardinality Bipartite Matching Algorithm on GPUs"
G-PR 1 1 1 0 3 2 2 2 5 6 8 2 1 4 3 3 6 3 4 4 9 1 0 4 4 3 4 5 5 3 2 6 6 1 0 7 7 3 4 8 8 1 0 9 9 Columns Rows Deveci et al. "A Push-Relabel-Based Max Cardinality Bipartite Matching Algorithm on GPUs"
G-PR 1 1 1 0 3 2 2 2 2 1 4 3 3 4 3 4 9 6 1 0 4 4 3 4 5 5 6 3 2 6 6 1 0 7 7 3 4 8 8 9 1 0 9 9 Columns Rows Deveci et al. "A Push-Relabel-Based Max Cardinality Bipartite Matching Algorithm on GPUs"
G-PR 1 1 1 0 3 4 2 2 2 3 4 3 3 4 3 4 9 6 1 2 4 4 3 4 5 5 6 3 4 6 6 1 -1 7 -1 1 0 7 7 3 4 8 8 9 1 2 9 9 Columns Rows Deveci et al. "A Push-Relabel-Based Max Cardinality Bipartite Matching Algorithm on GPUs"
G-PR 1 1 3 4 9 6 1 0 3 2 2 2 1 4 3 3 1 -1 7 -1 1 2 4 4 3 4 5 5 3 4 6 6 1 0 7 7 3 4 8 8 1 2 9 9 Columns Rows Deveci et al. "A Push-Relabel-Based Max Cardinality Bipartite Matching Algorithm on GPUs"
G-PR 1 1 1 1 0 3 2 2 2 1 4 3 3 1 -1 7 -1 1 2 4 4 3 4 5 5 3 4 6 6 7 1 0 7 7 3 4 8 8 1 2 9 9 Columns Rows Deveci et al. "A Push-Relabel-Based Max Cardinality Bipartite Matching Algorithm on GPUs"
G-PR 1 1 1 1 2 3 2 2 2 1 4 3 3 1 -1 7 -1 1 2 4 4 3 4 5 5 -1 -1 -1 -1 3 4 6 6 7 1 2 7 7 3 4 8 8 1 2 9 9 Columns Rows Deveci et al. "A Push-Relabel-Based Max Cardinality Bipartite Matching Algorithm on GPUs"
Improvements • Global Relabeling is used to improve the runtime. • Implemented on GPU, based on parallel BFS. • Choosing the frequency is not straightforward in parallel implementations. • Shrinking the size of active vertex array. • The number of active column vertices decreases. Employing a parallel prefix sum implementation, the active vertex array size is shrinked. 1 -1 -1 -1 -1 -1 7 -1 1 7 Deveci et al. "A Push-Relabel-Based Max Cardinality Bipartite Matching Algorithm on GPUs"
Experiments • 28 real graphs from various applications. • Compared the performance against sequential PR, multi (P-DBFS) and manycore (G-HKDW) augmenting path matching algorithms. • Studied the effect of global relabeling frequency + shrinking techniques. • CPU implementations • 2.27GHz dual quad-core Intel Xeon CPUs, 8 threads • 48GB RAM • GPU • NVIDIA Tesla C2050 • 2.6GB of global memory. Deveci et al. "A Push-Relabel-Based Max Cardinality Bipartite Matching Algorithm on GPUs"
Global Relabeling Frequency • G-PR-First => without active list • G-PR-NoShr => with active list, without shrinking • G-PR-Shr => with active list and shrinking after each global relabeling. • Adaptive => next global relabeling (GR) step is adjusted w.r.t. max level of current GR. • Fix => fix frequency for GR Deveci et al. "A Push-Relabel-Based Max Cardinality Bipartite Matching Algorithm on GPUs"
Performance Profiles • Performance profile: an algorithm obtains at most x times worse than the best performance on y fraction of the total instances. Deveci et al. "A Push-Relabel-Based Max Cardinality Bipartite Matching Algorithm on GPUs"
Individual Speedups 0.31 12.60 Average speedup: 3.05 G-PR is better than sequential on 23 graphs (out of 28) and at least 2 times faster for 19 of them. Deveci et al. "A Push-Relabel-Based Max Cardinality Bipartite Matching Algorithm on GPUs"
Conclusions • A lock- and atomic-free implementation of GPU parallel PR algorithm is presented. • We investigated the effect of the global-relabeling frequency and developed a strategy to amortize the cost of global and push relabeling. • The proposed G-PR algorithm obtained speedups varied from 0.31 to 12.60, averaging 3.05, on a set of 28 graphs with respect to a recent sequential push-relabel-based implementation. Deveci et al. "A Push-Relabel-Based Max Cardinality Bipartite Matching Algorithm on GPUs"
References • Azad, A., Halappanavar, M., Rajamanickam, S., Boman, E.G., Khan, A., Pothen, A., Multithreaded algorithms for maximum matching in bipartite graphs. In: 26th IPDPS. pp. 860–872. IEEE (2012) • M. Deveci, K. Kaya, B. Ucar, and U. V. Çatalyurek, GPU accelerated maximum cardinality matching algorithms for bipartite graphs, Euro-Par 2013, pp. 850-861 • B. V. Cherkassky, A. V. Goldberg, P. Martin, J. C. Setubal, and J. Stolfi. Augment or push: A computational study of bipartite matching and unit- capacity flow algorithms. Journal of Experimental Algorithmics, 3:8, 1998. • D. S. Hochbaum, The pseudoflow algorithm and the pseudoflow- based simplex for the maximum flow problem, in 6th International Conference on Integer Programming and Combinatorial Optimization, London, UK, 1998, pp. 325–337. • Chandran, Bala G., and Dorit S. Hochbaum. Practical and theoretical improvements for bipartite matching using the pseudoflowalgorithm, arXiv preprint arXiv:1105.1569 (2011) • Vineet, Vibhav, and P. J. Narayanan. CUDA cuts: Fast graph cuts on the GPU,Computer Vision and Pattern Recognition Workshops, 2008. CVPRW'08. IEEE Computer Society Conference on. IEEE, 2008. • Hussein, Mohamed, Amitabh Varshney, and Larry Davis. On implementing graph cuts on CUDA, First Workshop on General Purpose Processing on Graphics Processing Units. Vol. 2007. • He, Zhengyu, and Bo Hong. Dynamically tuned push-relabel algorithm for the maximum flow problem on CPU-GPU-hybrid platforms,Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on. IEEE, 2010. Deveci et al. "A Push-Relabel-Based Max Cardinality Bipartite Matching Algorithm on GPUs"
Thanks • For more information • Email umit@bmi.osu.edu • Visit http://bmi.osu.edu/~umit or http://bmi.osu.edu/hpc • Acknowledgement of Support Deveci et al. "A Push-Relabel-Based Max Cardinality Bipartite Matching Algorithm on GPUs"