230 likes | 618 Views
Parallel Prim’s Algorithm on dense graphs with a novel extension. Ekaterina Gonina and Laxmikant Kale Parallel Programming Lab, Department of Computer Science, UIUC. Problem Statement. Parallel implementation of Prim's algorithm on 100% dense graph > 10,000 vertices all vertices connected
E N D
Parallel Prim’s Algorithm on dense graphs with a novel extension Ekaterina Gonina and Laxmikant Kale Parallel Programming Lab, Department of Computer Science, UIUC
Problem Statement • Parallel implementation of Prim's algorithm on 100% dense graph • > 10,000 vertices • all vertices connected • Find minimum spanning tree • Goals: • Get speedup from running on multiple processors • Process large problems that don't fit on one processor
Graph Representation • Symmetric adjacency matrix • Distributed across processors • pseudo-random function to generate edge weights • Consistent on any number of processors
Baseline Parallel Implementation • Data is generated across processors • Vertex 0 is taken to be “in tree” set • Each processor finds closest vertex to the “in tree” set • MIN reduction • Broadcast to all processors of new vertex • Repeat until all vertices have been processed
Limitations of the Baseline Approach • There is some speedup on running the algorithm on small graphs • Communication time dominates computation resulting in no speedup for larger graphs
New Approach • The cost of reducing one number is about the same as reducing a small collection of numbers • New approach to increase efficiency: add multiple vertices in each iteration
“Checks” • Guarantee that adding each vertex in the K array will yield an MST Checks: 1. Check if any of the vertices in the array have a shorter edge to the current vertex in question 2. Check if any of the vertices in the array have a shorter edge to a vertex not in the tree than the current vertex’s edge. A B C D E F G
Enhanced Algorithm function primsMST(graph G){ numVerticesInTree = 0 add vertex 0 to VerticesInTree while(VerticesInTree<vertices in G){ // on each processor array[K] = findKclosestVertices(VerticesInTree) globalArray[K] = AllReduce(array[K]) // globalArray now has K globally closest vertices // to the tree K1 = determineValidVertices(globalArray) //checks // K1=number of vertices valid for this processor K2 = Allreduce(K1) // K2 = number of globally valid vertices add globalArray[1..K2] to VerticesInTree, numVerticesInTree += K2 } }
Vertices added/reduction vs. iteration number 50,000 vertices “ceilings out” at K=8 -> trying to add more vertices simply wastes work
Cray XT3 results • 100,000 vertices • 50,000 vertices
Additional Experiments • Do work during the reduction – update data structures as if all K vertices were added • Increased CPU utilization from 25% to 40%
Summary • Parallel Prim’s algorithm allows solving large problems that do not fit on one processor • No speedup due to communication > computation • Significant improvement from adding multiple vertices per iteration • More interesting work can be done with overlapping communication with computation
Thank you! Questions?