1 / 41

By, Sandeep A. Ganage M.Tech (IT) Guided by Dr. R. C. Thool

CUDA.. (Compute Unified Device Architecture). By, Sandeep A. Ganage M.Tech (IT) Guided by Dr. R. C. Thool. Terms. What is GPGPU? General-Purpose computing on a Graphics Processing Unit Using graphic hardware for non-graphic computations What is CUDA?

alaina
Download Presentation

By, Sandeep A. Ganage M.Tech (IT) Guided by Dr. R. C. Thool

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CUDA.. (Compute Unified Device Architecture) By, Sandeep A. Ganage M.Tech(IT) Guided by Dr. R. C. Thool

  2. Terms • What is GPGPU? • General-Purpose computing on a Graphics Processing Unit • Using graphic hardware for non-graphic computations • What is CUDA? • Compute Unified Device Architecture • Software architecture for managing data-parallel programming

  3. Introduction What is GPU? • It is a processor optimized for 2D/3D graphics, video, visual computing, and display. • It is highly parallel, highly multithreaded multiprocessor optimized for visual computing. • It provide real-time visual interaction with computed objects via graphics images, and video. • It serves as both a programmable graphics processor and a scalable parallel computing platform. • Heterogeneous Systems: combine a GPU with a CPU

  4. GPU Architecture

  5. Processing Element • Processing element = thread processor = ALU

  6. Memory Architecture • Constant Memory • Texture Memory • Device Memory

  7. CPU vs. GPU • CPU • Fast caches • Branching adaptability • High performance • Multicore • GPU • Multiple ALUs • Fast onboard memory • High throughput on parallel tasks • Executes program on each fragment/vertex • CPUs are great for task parallelism • GPUs are great for data parallelism

  8. Computing Capability GPU 369GIPS < CPU 177,730 IPS Why.. ??

  9. CPU vs. GPU • GPUs contain much larger number of dedicated ALUs then CPUs. • GPUs also contain extensive support of Stream Processing paradigm. It is related to SIMD ( Single Instruction Multiple Data) processing.  • Each processing unit on GPU contains local memory that improves data manipulation and reduces fetch time.

  10. CPU vs. GPU More transistors devoted to data processing

  11. “What is CUDA” • CUDA is a set of developing tools to create applications that will perform execution on GPU (Graphics Processing Unit). • CUDA compiler uses variation of C with future support of C++ • CUDA was developed by NVidia and as such can only run on NVidia GPUs of G8x series and up.  • CUDA was released on February 15, 2007 for PC and Beta version for MacOS X on August 19, 2008. 

  12. Why CUDA • CUDA provides ability to use high-level languages such as C to develop application that can take advantage of high level of performance and scalability that GPUs architecture offer.  •  GPUs allow creation of very large number of concurrently executed threads at very low system resource cost. • CUDA also exposes fast shared memory (16KB) that can be shared between threads.  •  Full support for integer and bitwise operations. • Compiled code will run directly on GPU.

  13. Software Requirements/Tools • CUDA device driver • CUDA Software Development Kit • Emulator • CUDA Toolkit • Occupancy calculator • Visual profiler

  14. How CUDA works..????? • We need to allocate space in the GPU’s memory for the variables. • The video card does not have I/O devices, hence we need to copy the input data from the memory in the host computer into the memory in the GPU, using the variable allocated in the previous step. • We need to specify code to execute. • Copy the results back to the memory in the host computer.

  15. Initially: Host’s Memory GPU Card’s Memory array

  16. Allocate Memory in the GPU card Host’s Memory GPU Card’s Memory array array_d

  17. Copy content from the host’s memory to the GPU card memory Host’s Memory GPU Card’s Memory array array_d

  18. Execute code on the GPU GPU MPs Host’s Memory GPU Card’s Memory array array_d

  19. Copy results back to the host memory Host’s Memory GPU Card’s Memory array array_d

  20. The Kernel • It is necessary to write the code that will be executed in the stream processors in the GPU card • That code, called the kernel, will be downloaded and executed, simultaneously and in lock-step fashion, in several (all?) stream processors in the GPU card • How is every instance of the kernel going to know which piece of data it is working on?

  21. In the GPU: Block 1 Block 0 ProcessingElements Array Elements

  22. In the GPU: Thread 0 Thread 1 Thread 2 Thread 3 Thread 0 Thread 1 Thread 2 Thread 3 Block 0 Block 1 ProcessingElements Array Elements

  23. To compile: • nvccsimple.c simple.cu –o simple • The compiler generates the code for both the host and the GPU • Demo on cuda.littlefe.net …

  24. Testing - Matrices • Test the multiplication of two matrices. • Creates two matrices with random floating point values. • We tested with matrices of various dimensions…

  25. Results:

  26. Applications of CUDA • Electrodynamics and Electromagnetic • Nuclear Physics, Molecular Dynamics and Computational Chemistry • Video, Imaging and Vision Applications • Game Industry • Matlab, Labview , Mathematica, R • Weather and Ocean Modeling • Financial Computing and Options Pricing • Medical Imaging, CT, MRI • Government and Defence • Geophysics

  27. Conclusions.. • GPGPU enhances the power of GPU in order to execute all the computing operations which are intended to be performed on CPU. • With Runtime API and Driver API provides by NVDIA device drivers and NVCC compiler, Immensely parallel computing can be performed with hundred times faster than that on CPU. • CUDA is the best platform over SIMD architecture. • Many scientific applications, simulators, high end computations and research oriented simulations can be easily performed on GPU with CUDA.

  28. References Song Jun Park, “An Analysis of GPU Parallel Computing, ” 2009 DoD High Performance Computing Modernization Program Users Group Conference. Ian Buck, “GPU Computing: Programming a Massively Parallel Processor, ” International Symposium on Code Generation and Optimization (CGO'07) NobuhiroFunatsu, YoshimitsuKuroki, “Fast Parallel Processing using GPU in computing L1-PCA bases, ” 2010 IEEE John Owens, “GPU Computing: Heterogeneous Computing for Future Systems, ” 2009 IEEE Jason Sanders, Edward Kandrot, “CUDA by Example An Introduction to General Purpose GPU Programming, ” NVIDIA Corporation.

  29. Courtesy… Supercomputing 2008 Education Program

  30. Thank you…Queries..???

More Related