340 likes | 812 Views
1. What Will We Cover?. Quick overview of Transitive Closure and All-Pairs Shortest Path Uses for Transitive Closure and All-Pairs GPUs, What are they and why do we care? The GPU problem with performing Transitive Closure and All-Pairs
E N D
1. All-Pairs-Shortest-Paths for Large Graphs on the GPU Gary J Katz1,2, Joe Kider1
1University of Pennsylvania
2Lockheed Martin IS&GS
2. 1 What Will We Cover? Quick overview of Transitive Closure and All-Pairs Shortest Path
Uses for Transitive Closure and All-Pairs
GPUs, What are they and why do we care?
The GPU problem with performing Transitive Closure and All-Pairs
.
Solution, The Block Processing Method
Memory formatting in global and shared memory
Results
3. 2 Previous Work A Blocked All-Pairs Shortest-Paths Algorithm
Venkataraman et al.
Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph
Bondhugula et al.
Accelerating large graph algorithms on the GPU using CUDA
Harish
4. 3 NVIDIA GPU Architecture Processors are broken into blocks
Each block has access to a shared memory
Blocks can not communicate with each other
All blocks have access to global GPU memory
GPU does not have access to CPU memory except for explict memcpys made by CPU
Mention CUDA,
Extension of C
Allows you to access the L1 cache explicitly
Processors are broken into blocks
Each block has access to a shared memory
Blocks can not communicate with each other
All blocks have access to global GPU memory
GPU does not have access to CPU memory except for explict memcpys made by CPU
Mention CUDA,
Extension of C
Allows you to access the L1 cache explicitly
5. 4 Quick Overview of Transitive Closure The Transitive Closure of G is defined as the graph G* = (V, E*), whereE* = {(i,j) : there is a path from vertex i to vertex j in G}
-Introduction to Algorithms, T. Cormen Also explain what all pairs shortest path is here
Also explain what all pairs shortest path is here
6. 5 Quick Overview All-Pairs-Shortest-Path The All-Pairs Shortest-Path of G is defined for every pair of vertices u,v E V as the shortest (least weight) path from u to v, where the weight of a path is the sum of the weights of its constituent edges.
-Introduction to Algorithms, T. Cormen Also explain what all pairs shortest path is here
Also explain what all pairs shortest path is here
7. 6 Uses for Transitive Closure and All-Pairs Social Network Analysis
Communications networks
Travel Networks
Medical
Social Network Analysis
Communications networks
Travel Networks
Medical
8. 7 Floyd-Warshall Algorithm K tells you which node we are connecting throughK tells you which node we are connecting through
9. 8 Parallel Floyd-Warshall
10. 9 The Question How do we calculate the transitive closure on the GPU to:
Take advantage of shared memory
Accommodate data sizes that do not fit in memory What were the issues with the previous parallel implementation of Transitive closure?What were the issues with the previous parallel implementation of Transitive closure?
11. 10 Block Processing of Floyd-Warshall Want to perform partial processing of a sub-block of the matrix.
Want a method that will allow a hierarchy of processing to occur.
What we want to determine is what memory needs to be loaded to process a block of the data matrix.Want to perform partial processing of a sub-block of the matrix.
Want a method that will allow a hierarchy of processing to occur.
What we want to determine is what memory needs to be loaded to process a block of the data matrix.
12. 11 Block Processing of Floyd-Warshall Start out with the simple caseStart out with the simple case
13. 12 Block Processing of Floyd-Warshall We can compute the transitive closure for up to N = 4
We want to eventually compute the transitive closure for the total N, not the partial N
Lets start by trying to compute the transitive closure for the entire graph up to K = 4.We can compute the transitive closure for up to N = 4
We want to eventually compute the transitive closure for the total N, not the partial N
Lets start by trying to compute the transitive closure for the entire graph up to K = 4.
14. 13 Block Processing of Floyd-Warshall Lets look at the corners
Lets look at the corners
15. 14 Block Processing of Floyd-Warshall Lets look at the cornersLets look at the corners
16. 15 Block Processing of Floyd-Warshall
17. 16 Block Processing of Floyd-Warshall Lets look at the cornersLets look at the corners
18. 17 Increasing the Number of Blocks
19. 18 Increasing the Number of Blocks
20. 19 Increasing the Number of Blocks
21. 20 Increasing the Number of Blocks
22. 21 Increasing the Number of Blocks
23. 22 Increasing the Number of Blocks
24. 23 Increasing the Number of Blocks
25. 24 Increasing the Number of Blocks
26. 25 Running it on the GPU Using CUDA
Written by NVIDIA to access GPU as a parallel processor
Do not need to use graphics API Explain that each thread will load one memory location from global memory into the shared memory
Then well sync the threads and run the partial Floyd-WarshallExplain that each thread will load one memory location from global memory into the shared memory
Then well sync the threads and run the partial Floyd-Warshall
27. 26 Partial Memory Indexing SP1
28. 27 Memory Format for All-Pairs Solution All-Pairs requires twice the memory footprint of Transitive Closure
29. 28 Results
30. 29 Results
31. 30 Results
32. 31 Conclusion Advantages of Algorithm
Relatively Easy to Implement
Cheap Hardware
Much Faster than standard CPU version
Can work for any data size
33. 32 Backup
34. 33 CUDA CompUte Driver Architecture
Extension of C
Automatically creates thousands of threads to run on a graphics card
Used to create non-graphical applications
Pros:
Allows user to design algorithms that will run in parallel
Easy to learn, extension of C
Has CPU version, implemented by kicking off threads
Cons:
Low level, C like language
Requires understanding of GPU architecture to fully exploit Slide based upon David Kirk lecture http://courses.ece.uiuc.edu/ece498/al/lectures/lecture6-cuda-fixed.ppt
Slide based upon David Kirk lecture http://courses.ece.uiuc.edu/ece498/al/lectures/lecture6-cuda-fixed.ppt