160 likes | 374 Views
Performance study of multi-GPU acceleration of LU Factorization. Paul Brandon Abbott University of Denver Dr. Yifeng Zhu University of Maine. Outline. Motivation What is the LU Factorization? Method Results Conclusion. Motivations. Multi-GPU acceleration requires setup! CUDA
E N D
Performance study of multi-GPU acceleration of LU Factorization Paul Brandon Abbott University of Denver Dr. Yifeng Zhu University of Maine
Outline Motivation What is the LU Factorization? Method Results Conclusion
Motivations • Multi-GPU acceleration requires setup! • CUDA • P-Threads • Communication may be required. • Synchronization. • Memory Transfers. • Overhead time is generated.
Fundamental Question: Will multiple GPUs still obtain better performance than a single GPU (or CPU), given that there is overhead in communication?
Why factor? • Solving Linear Equations becomes trivial. • The inverse of A is also easy. • Determinate of A? No problem. • Analogous to factoring a polynomial.
How to factor: Step One Step Two Step One Step One Step Two …
CUDA – Using a GPU • Copy memory onto device. • Specify Blocks & Threads • Execute • Copy Memory back. Steps to Use GPU Grid of Thread-Blocks
Blocks have 512 threads each GPU does scheduling to handle as many blocks as possible
Multi-GPU Setup: CUDA int main() { … POSIX Threads CUDA Done! CUDA CUDA
Conclusion • The size of working dataset influences the efficiency of multi-GPUs. • There is a time to use: • The CPU • The GPU • Multiple GPUs