410 likes | 630 Views
CUDA.. (Compute Unified Device Architecture). By, Sandeep A. Ganage M.Tech (IT) Guided by Dr. R. C. Thool. Terms. What is GPGPU? General-Purpose computing on a Graphics Processing Unit Using graphic hardware for non-graphic computations What is CUDA?
E N D
CUDA.. (Compute Unified Device Architecture) By, Sandeep A. Ganage M.Tech(IT) Guided by Dr. R. C. Thool
Terms • What is GPGPU? • General-Purpose computing on a Graphics Processing Unit • Using graphic hardware for non-graphic computations • What is CUDA? • Compute Unified Device Architecture • Software architecture for managing data-parallel programming
Introduction What is GPU? • It is a processor optimized for 2D/3D graphics, video, visual computing, and display. • It is highly parallel, highly multithreaded multiprocessor optimized for visual computing. • It provide real-time visual interaction with computed objects via graphics images, and video. • It serves as both a programmable graphics processor and a scalable parallel computing platform. • Heterogeneous Systems: combine a GPU with a CPU
Processing Element • Processing element = thread processor = ALU
Memory Architecture • Constant Memory • Texture Memory • Device Memory
CPU vs. GPU • CPU • Fast caches • Branching adaptability • High performance • Multicore • GPU • Multiple ALUs • Fast onboard memory • High throughput on parallel tasks • Executes program on each fragment/vertex • CPUs are great for task parallelism • GPUs are great for data parallelism
Computing Capability GPU 369GIPS < CPU 177,730 IPS Why.. ??
CPU vs. GPU • GPUs contain much larger number of dedicated ALUs then CPUs. • GPUs also contain extensive support of Stream Processing paradigm. It is related to SIMD ( Single Instruction Multiple Data) processing. • Each processing unit on GPU contains local memory that improves data manipulation and reduces fetch time.
CPU vs. GPU More transistors devoted to data processing
“What is CUDA” • CUDA is a set of developing tools to create applications that will perform execution on GPU (Graphics Processing Unit). • CUDA compiler uses variation of C with future support of C++ • CUDA was developed by NVidia and as such can only run on NVidia GPUs of G8x series and up. • CUDA was released on February 15, 2007 for PC and Beta version for MacOS X on August 19, 2008.
Why CUDA • CUDA provides ability to use high-level languages such as C to develop application that can take advantage of high level of performance and scalability that GPUs architecture offer. • GPUs allow creation of very large number of concurrently executed threads at very low system resource cost. • CUDA also exposes fast shared memory (16KB) that can be shared between threads. • Full support for integer and bitwise operations. • Compiled code will run directly on GPU.
Software Requirements/Tools • CUDA device driver • CUDA Software Development Kit • Emulator • CUDA Toolkit • Occupancy calculator • Visual profiler
How CUDA works..????? • We need to allocate space in the GPU’s memory for the variables. • The video card does not have I/O devices, hence we need to copy the input data from the memory in the host computer into the memory in the GPU, using the variable allocated in the previous step. • We need to specify code to execute. • Copy the results back to the memory in the host computer.
Initially: Host’s Memory GPU Card’s Memory array
Allocate Memory in the GPU card Host’s Memory GPU Card’s Memory array array_d
Copy content from the host’s memory to the GPU card memory Host’s Memory GPU Card’s Memory array array_d
Execute code on the GPU GPU MPs Host’s Memory GPU Card’s Memory array array_d
Copy results back to the host memory Host’s Memory GPU Card’s Memory array array_d
The Kernel • It is necessary to write the code that will be executed in the stream processors in the GPU card • That code, called the kernel, will be downloaded and executed, simultaneously and in lock-step fashion, in several (all?) stream processors in the GPU card • How is every instance of the kernel going to know which piece of data it is working on?
In the GPU: Block 1 Block 0 ProcessingElements Array Elements
In the GPU: Thread 0 Thread 1 Thread 2 Thread 3 Thread 0 Thread 1 Thread 2 Thread 3 Block 0 Block 1 ProcessingElements Array Elements
To compile: • nvccsimple.c simple.cu –o simple • The compiler generates the code for both the host and the GPU • Demo on cuda.littlefe.net …
Testing - Matrices • Test the multiplication of two matrices. • Creates two matrices with random floating point values. • We tested with matrices of various dimensions…
Applications of CUDA • Electrodynamics and Electromagnetic • Nuclear Physics, Molecular Dynamics and Computational Chemistry • Video, Imaging and Vision Applications • Game Industry • Matlab, Labview , Mathematica, R • Weather and Ocean Modeling • Financial Computing and Options Pricing • Medical Imaging, CT, MRI • Government and Defence • Geophysics
Conclusions.. • GPGPU enhances the power of GPU in order to execute all the computing operations which are intended to be performed on CPU. • With Runtime API and Driver API provides by NVDIA device drivers and NVCC compiler, Immensely parallel computing can be performed with hundred times faster than that on CPU. • CUDA is the best platform over SIMD architecture. • Many scientific applications, simulators, high end computations and research oriented simulations can be easily performed on GPU with CUDA.
References Song Jun Park, “An Analysis of GPU Parallel Computing, ” 2009 DoD High Performance Computing Modernization Program Users Group Conference. Ian Buck, “GPU Computing: Programming a Massively Parallel Processor, ” International Symposium on Code Generation and Optimization (CGO'07) NobuhiroFunatsu, YoshimitsuKuroki, “Fast Parallel Processing using GPU in computing L1-PCA bases, ” 2010 IEEE John Owens, “GPU Computing: Heterogeneous Computing for Future Systems, ” 2009 IEEE Jason Sanders, Edward Kandrot, “CUDA by Example An Introduction to General Purpose GPU Programming, ” NVIDIA Corporation.
Courtesy… Supercomputing 2008 Education Program