All-Pairs-Shortest-Paths for Large Graphs on the GPU

1. All-Pairs-Shortest-Paths for Large Graphs on the GPU Gary J Katz1,2, Joe Kider1 1University of Pennsylvania 2Lockheed Martin IS&GS

2. 1 What Will We Cover? Quick overview of Transitive Closure and All-Pairs Shortest Path Uses for Transitive Closure and All-Pairs GPUs, What are they and why do we care? The GPU problem with performing Transitive Closure and All-Pairs�. Solution, The Block Processing Method Memory formatting in global and shared memory Results

3. 2 Previous Work �A Blocked All-Pairs Shortest-Paths Algorithm� Venkataraman et al. �Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph� Bondhugula et al. �Accelerating large graph algorithms on the GPU using CUDA� Harish

4. 3 NVIDIA GPU Architecture Processors are broken into blocks Each block has access to a shared memory Blocks can not communicate with each other All blocks have access to global GPU memory GPU does not have access to CPU memory except for explict memcpys made by CPU Mention CUDA, Extension of C Allows you to access the L1 cache explicitly Processors are broken into blocks Each block has access to a shared memory Blocks can not communicate with each other All blocks have access to global GPU memory GPU does not have access to CPU memory except for explict memcpys made by CPU Mention CUDA, Extension of C Allows you to access the L1 cache explicitly

5. 4 Quick Overview of Transitive Closure The Transitive Closure of G is defined as the graph G* = (V, E*), whereE* = {(i,j) : there is a path from vertex i to vertex j in G} -Introduction to Algorithms, T. Cormen Also explain what all pairs shortest path is here�Also explain what all pairs shortest path is here�

6. 5 Quick Overview All-Pairs-Shortest-Path The All-Pairs Shortest-Path of G is defined for every pair of vertices u,v E V as the shortest (least weight) path from u to v, where the weight of a path is the sum of the weights of its constituent edges. -Introduction to Algorithms, T. Cormen Also explain what all pairs shortest path is here�Also explain what all pairs shortest path is here�

7. 6 Uses for Transitive Closure and All-Pairs Social Network Analysis Communications networks Travel Networks Medical Social Network Analysis Communications networks Travel Networks Medical

8. 7 Floyd-Warshall Algorithm K tells you which node we are connecting throughK tells you which node we are connecting through

9. 8 Parallel Floyd-Warshall

10. 9 The Question How do we calculate the transitive closure on the GPU to: Take advantage of shared memory Accommodate data sizes that do not fit in memory What were the issues with the previous parallel implementation of Transitive closure?What were the issues with the previous parallel implementation of Transitive closure?

11. 10 Block Processing of Floyd-Warshall Want to perform partial processing of a sub-block of the matrix. Want a method that will allow a hierarchy of processing to occur. What we want to determine is what memory needs to be loaded to process a block of the data matrix.Want to perform partial processing of a sub-block of the matrix. Want a method that will allow a hierarchy of processing to occur. What we want to determine is what memory needs to be loaded to process a block of the data matrix.

12. 11 Block Processing of Floyd-Warshall Start out with the simple caseStart out with the simple case

13. 12 Block Processing of Floyd-Warshall We can compute the transitive closure for up to N = 4 We want to eventually compute the transitive closure for the total N, not the partial N Lets start by trying to compute the transitive closure for the entire graph up to K = 4.We can compute the transitive closure for up to N = 4 We want to eventually compute the transitive closure for the total N, not the partial N Lets start by trying to compute the transitive closure for the entire graph up to K = 4.

14. 13 Block Processing of Floyd-Warshall Lets look at the corners Lets look at the corners

15. 14 Block Processing of Floyd-Warshall Lets look at the cornersLets look at the corners

16. 15 Block Processing of Floyd-Warshall

17. 16 Block Processing of Floyd-Warshall Lets look at the cornersLets look at the corners

18. 17 Increasing the Number of Blocks








26. 25 Running it on the GPU Using CUDA Written by NVIDIA to access GPU as a parallel processor Do not need to use graphics API Explain that each thread will load one memory location from global memory into the shared memory Then we�ll sync the threads and run the partial Floyd-WarshallExplain that each thread will load one memory location from global memory into the shared memory Then we�ll sync the threads and run the partial Floyd-Warshall

27. 26 Partial Memory Indexing SP1

28. 27 Memory Format for All-Pairs Solution All-Pairs requires twice the memory footprint of Transitive Closure

29. 28 Results

30. 29 Results

31. 30 Results

32. 31 Conclusion Advantages of Algorithm Relatively Easy to Implement Cheap Hardware Much Faster than standard CPU version Can work for any data size

33. 32 Backup

34. 33 CUDA CompUte Driver Architecture Extension of C Automatically creates thousands of threads to run on a graphics card Used to create non-graphical applications Pros: Allows user to design algorithms that will run in parallel Easy to learn, extension of C Has CPU version, implemented by kicking off threads Cons: Low level, C like language Requires understanding of GPU architecture to fully exploit Slide based upon David Kirk lecture http://courses.ece.uiuc.edu/ece498/al/lectures/lecture6-cuda-fixed.ppt Slide based upon David Kirk lecture http://courses.ece.uiuc.edu/ece498/al/lectures/lecture6-cuda-fixed.ppt

All-Pairs-Shortest-Paths for Large Graphs on the GPU

All-Pairs-Shortest-Paths for Large Graphs on the GPU

Presentation Transcript

Graphs: shortest paths

All-Pairs Shortest Paths

Chapter 25: All-Pairs Shortest Paths

Connected Components & All Pairs Shortest Paths

Optimal Distributed All Pairs Shortest Paths

All-Pairs Shortest Paths

All-Pairs Shortest Paths

all-pairs shortest paths in undirected graphs

Chapter 26 All-Pairs Shortest Paths

Lecture11- All-pairs shortest paths

Dynamic All Pairs Shortest Paths

Chapter 25: All-Pairs Shortest-Paths

All-Pairs Shortest Paths (26.0/25)

All-Pairs Shortest Paths

Dynamic Programming algorithms for all-pairs Shortest Paths

All-pairs Shortest Paths

All-Pairs Shortest Paths

Optimal Distributed All Pairs Shortest Paths

All-Pairs Shortest Paths

Lecture11- All-pairs shortest paths

Lecture 7 All-Pairs Shortest Paths

All-Pairs Shortest Paths

All-Pairs-Shortest-Paths for Large Graphs on the GPU