800 likes | 815 Views
This thesis explores various algorithms for vertex-weighted matching in bipartite and general graphs, with applications in sparse matrix computations, graph algorithms, scheduling problems, and more.
E N D
Algorithms for Vertex-Weighted MatchingMahantesh Halappanavar Thesis Defense Advisor: Alex Pothen Committee : Jessica Crouch Bruce Hendrickson Stephan Olariu Mohammad Zubair 23 January, 2009
S T A Graph A graphG is a pair (V, E) • V is a set of vertices • E, a set of edges, represents a binary relation on V • Bipartite and nonbipartite • Weighted and unweighted w
A Matching A matchingM is a subset of edges such that no two edges in M are incident on the same vertex
Applications of Matchings • Sparse matrix computations • Matrix preconditioning • Block Triangular Form • Multilevel Graph Algorithms • Graph partitioners • Graph clustering • Scheduling Problem • High speed network switching • Facility scheduling problem • Bioinformatics • Homology detection • Structural alignment
Types of Matchings Bipartite General Vertex Wtd Matching Exact Approx Bipartite General Bipartite General Edge Wtd Matching Cardinality Matching Exact Approx Exact Approx
Our Contributions • New ⅔-approx algorithm (bipartite MVM) • Better understanding of the MVM problem • New ½-approx algorithms (MVM) • Parallel ½-approx algorithm for wtd matching • Software implementations
Proposed Serial Algorithms • B = Bipartite; G = General graphs • n is number of vertices and mis number the edges • d2 is the average number of distinct alternating paths, of length at most 3edges, starting at a vertex
M M M M M M M Augmentation • An alternating path: • An augmenting path: • Augmentation by Symmetric Difference :
S T S T S T 3 3 9 9 2 2 6 6 1 1 10 10 5 5 7 7 4 4 8 8 Restricted Bipartite Graph +
First task:SortS vertices in decreasing order of weights GLOBALOPTIMAL: M* GLOBALTWOTHIRD: M⅔ Second task: Compute a matching by processing S vertices in pre-computed order with augmenting paths of length at most three • Second task: Compute a maximum cardinality matching by processing Svertices in pre-computed order
S T S T S T 3 3 3 2 2 2 1 1 1 5 5 5 4 4 4 Execution of GLOBALTWOTHIRD S T 3 2 1 5 4 w(M*)= 5+4+3+2 = 14 w(M⅔)= 5+4+2 = 11 Approximation ratio= 11/14 > 2/3
Complete Solution S-restricted bipartite graph MS Mendelsohn-Dulmage Technique General Bipartite graph M T-restricted bipartite graph MT
Proofs of Correctness Exact algorithm ⅔-approx algorithm
Exact: Reachability Property Given: G=(S, T, E), weights on S, and a matchingMS For every MS–matched vertex s’ reachable from an MS–unmatched vertex si via an MS–alternating path: w(s’)≥ w(si)
P M* MS M* MS s si s’ Exact: Reachability Optimality Given: G=(S, T, E), weights on S, and a maximum cardinalitymatchingMS If MS satisfies the reachability property, then it is also a maximumvertex-weightmatching Proof: Use contradiction If: w(s’) > w(s)
⅔-approx: Skeleton of the Proof • Consider concurrent execution of algorithms GLOBALOPTIMALand GLOBALTWOTHIRD • At a given step, both algorithms will process the samevertex si S • A failedvertex is a vertex that is matched in M*, but not inM⅔ • Intuition: Show that for every failedvertex, there are two distinctvertices that are matched in M⅔ heavier than the failed
M* M⅔ M* M⅔ s1 s1,a,k s1,b,k s2 s2,a,k s2,b,k sk sk,a,k sk,b,k Induction: Step k Consider execution when vertex skfails:
M* si M* M⅔ M* si si,a,k M* M⅔ M* M⅔ si si,a,k si,b,k Proof: M*,kM⅔,k (a) M* M⅔ (b) si si,a,k (c) (d)
M* M⅔ M⅔ M* sk sk,a,k sk,b,k For Failed Vertex sk There are two distinctM⅔–matched vertices “heavier” than sk Proof: Vertices are processed in a decreasing order of weights
M* M⅔ M* M⅔ s1 s1,a,k s1,b,k s2 s2,a,k s2,b,k sk sk,a,k sk,b,k A Potential Problem State of vertices failed earlier: Possibility: w(si) > w(si,b,k)
M* M⅔ M* M⅔ s1 s1,a,1 s1,b,1 s1 s1,a s1,b Counting Technique For every failed vertex si,iI={1, …,k},there are two distinct M⅔–matched vertices heavier than si (not necessarily on the alternating path) Proof: Induct on the failed steps: • Step 1:
M* M* M3 M⅔ M* M* M3 M⅔ s1 s2 s2,a,2 s1,a,2 s2,b,2 s1,b,2 s1 s2 s2,a s1,a s2,b s1,b … Counting Technique • Step 2:
Related Work • 2004: Jaap-HenkHoepman • Show parallel algorithm as a variant of Preis’s algorithm • One vertex per processor (theoretical) • 2007: Fredrik Manne and Rob Bisseling: • Extend Hoepman’s work • Show parallel algorithm as a variant of Luby’s algorithm (maximal independent set problem) • Limited experimental results (32 processors)
Data Distribution P0 P1 Ghost vertices Cross-edges
Serial Pointer-based Algorithm • For each vertex, set a pointer to the heaviest neighbor • If two vertices point to each other, then add the (locally dominating) edge to matching • Remove all edges incident on the matched edges, reset the pointers, and repeat
Execution of Pointer-based Algorithm Parallel in nature
Our algorithm: Parallel Pointer-based • Initialization of data structures • Phase 1: Independent Computation • Identify locally-dominant edges • Sendmessages as needed (cross-edges) • Phase 2: Shared Computation • Receive messages • Computation based on the messages received • Send messages as needed • Repeat until no more edges can be matched
Phase 1: Independent Computation • For eachlocal vertex, set pointer to heaviest neighbor • If point to ghost,enqueueREQUEST to its owner • Repeat: • Vertices pointing to each other: Match • Removeincident edges; enqueue UNAVAILABLE messages for cross-edges • Reset pointers; enqueuemessages as needed • Repeat until no more edges can be matched • Send all queued messages
Phase 2: Shared Computation S : set of ghost vertices; Counter[vg] = Local degree of ghost vg • WHILE (S ≠ NULL) DO • Receive a message M(vl, vg, type) • Process based on type (REQUEST/UNAVAILABLE/FAILURE) • Dominating? Match; remove incident edges; sendUNAVAILABLEmessage for cross-edges • Reset pointers; send messages as needed • Update: • Counter[vg]: Decrement the counter • S: Remove vgfrom Swhen Counter[vg]=0 • SendFAILURE messages if some vertex cannot be matched
Performance of Serial Pointer-based Algorithm • 2.4 GHz Intel Xeon (64-bit) with 32 GB RAM • Exact algorithm: O(|V|3) • Pointer-based ½-approx algorithm: • O(|E|), = maximum degree • O(|E|): Expected with random edge-weights Approximation algorithm is fast
Relative Performance of Approx Algorithms Pointer-based algorithm is relatively fast and computes matchings of the same quality as the others.
Platform Details • Franklin: A massively parallel processing (MPP) Cray XT-4 system at NERSC with 9,660 compute nodes • 38,640 processor cores • Theoretical peak: 356 Tflop/sec. • Compute Node: One 2.3 GHz quad core AMD Opteron processor; 8 GB RAM • Network: SeaStar2 router (3D torus topology) • Software: • PGI C++ compilers (–fast*) • Cray MPICH2 • * -fast = -O2 -Munroll=c:1 -Mnoframe -Mlre -Mvect=sse -Mscalarsse -Mcache_align -Mflushz
Test Set • Five-point Grid Graphs • Random Geometric Graphs • Scalable Synthetic Graphs (SSCA#2) • Graphs from Applications Maximum Time Average Time T P
Five-Point Grid Graphs4k X 4k (|V|= 16,000,000; |E|= 31,992,000)
Random Geometric Graphs(|V|= 640,000; |E|= 3,080,872) SSCA#2 Graph(|V|=524,288; |E|=10,008,022) Graphs from ApplicationsSource: University of Florida Sparse Matrix Collection
Conclusions • New ⅔-approx Algorithm (bipartite MVM) • Failed vertices; Counting technique • Better understanding of the MVM problem • Reachability property • New ½-approx MVM Algorithms • Restricted reachability property • Parallel ½-approx Algorithm for wtd matching • Scalable implementation and analysis • Software Implementations • Parallel: MatchBoxP (C++, MPI and STL) • Serial: Milan (C++ and STL)
Conclusions • Parallel Algorithms: • Structure of the graph as well as partitioning affect performance • Memory limitations will affect data structures (ghost vertices), and therefore, algorithm design and performance • Hybrid implementations (MPI + OpenMP) can provide better performance • Fewer partitions imply lesser communication
Future Work • Experimental analysis of serial vertex-weighted matching algorithms • Parallelexpts on machines with thousands of cores at Purdue and the DOE leadership class facilities • Parallel graph generators • Software engineering: data structures, error handling, documentation, etc. • Parallel optimal matching algorithm
Some Open Problems • Proof of correctness for Algorithm LOCALTWOTHIRD • ⅔-approx algorithm for nonbipartitegraphs • ¾-approx algorithm for bipartite graphs