1 / 7

Hardware Acceleration Using GPUs

Hardware Acceleration Using GPUs. M Anirudh Guide: Prof. Sachin Patkar VLSI Consortium April 4, 2008. Advantages of Using Graphics Processors. Parallel architectures with lots of ALUs High memory bandwidth Cheap, fast and scalable New generation within 2 years High Gflops/$. Cons.

verne
Download Presentation

Hardware Acceleration Using GPUs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hardware Acceleration Using GPUs M Anirudh Guide: Prof. Sachin Patkar VLSI Consortium April 4, 2008

  2. Advantages of Using Graphics Processors • Parallel architectures with lots of ALUs • High memory bandwidth • Cheap, fast and scalable • New generation within 2 years • High Gflops/$ Cons • No double precision yet ( only SP floating point operations) • Loss of precision (not fully IEEE 754 compliant)

  3. NVIDIA GeForce 8 Series cards • Currently using 8500GT to test our algorithms • 8500GT has 16 processors and a theoretical peak fp performance of 28.8 Gflops and memory bandwidth of 12.8GB/s • Scalable architecture • 8800 GT – 128 processors, ~350 Gflops and 86.4 GB/s

  4. GeForce 8500GT Architecture Thread Scheduler Control Control ALU ALU ALU ALU ALU ALU ALU ALU Local Memory Local Memory ALU ALU ALU ALU ALU ALU ALU ALU Shared Memory GLOBAL MEMORY

  5. Programming Model • Massively multi-threaded • Threads -> warps -> blocks -> grid • Shared memory and global memory • Coalesced memory access - 5GB/s – 70 GB/s

  6. Results • Matrix-vector operations are so slow because of the data transfer from host to device. • 10 Gflops on GPU for matrix-matrix compared to 2+ Gflops on CPU and 6 Gflops reported using BLAS. Also Nvidia 8800 card is observed to have a performance of up to 180 Gflops for matrix-matrix multiplication using optimized algorithms.

  7. Conclusion • Most reported performances for GPU are ~30-40% of theoretical peak performances. These are still 5x - 10x faster than CPU • Considerable understanding and work required to fully optimize code • Matrix-matrix operations are easily a magnitude faster than on CPU Future Work • Aim is to develop optimized routines for LU decomposition, Cholesky, Conjugate Gradient etc • Try to incorporate these routines with the DC Analyzer to achieve both performance improvement as well as tackle larger data sizes.

More Related