660 likes | 679 Views
CUDA Overview: A Fast Introduction. CUDA Overview. João Gabriel Felipe Machado Gazolla Advisor: Dr. Esteban Clua. Topics. CUDA Overview: A Fast Introduction. What is Cuda ?. Where to Download?. How to Install. Architecture. Performance. Visual Studio Integration. Examples.
E N D
CUDA Overview: A Fast Introduction CUDA Overview João Gabriel Felipe Machado Gazolla Advisor: Dr. Esteban Clua
Topics CUDA Overview: A Fast Introduction • What is Cuda? • Where to Download? • How to Install • Architecture • Performance • Visual Studio Integration • Examples • How to Learn more aboutCUDA? GPUs • StudyPlan • References • Discussion
Goal CUDA Overview: A Fast Introduction “...ExplainTheBasics Of CUDA...”
What is CUDA? CUDA Overview: A Fast Introduction Compute UnifiedDeviceArchitecture CUDA is the computing engine in NVIDIA graphics processing units or GPUs, that is accessible to software developers through industry standard programming languages
CUDA Performance CUDA Overview: A Fast Introduction
CPU Scenario CUDA Overview: A Fast Introduction • Specific Code Ex: Population 1024 Soldiers soldierScore(x) Fitness Function 12387 Unit Points Soldier[i] soldierScore(soldier[i]) Soldier[0...1023] (1024/1) *time(soldierScore())
GPU Scenario CUDA Overview: A Fast Introduction • Specific Code Ex: Population 1024 Soldiers soldierScore(x) Fitness Function GeForce XXXX++256 processors 12387 ... 12494 ... 15912 Unit Points Soldier[i] ... Soldier[i+n] soldierScore(soldier[i]) Soldier[0...1023] (1024/256) *time(soldierScore())
What do I need to run CUDA? CUDA Overview: A Fast Introduction
Where to Download CUDA ? CUDA Overview: A Fast Introduction
What to Download ? CUDA Overview: A Fast Introduction
Does it Worth? CUDA Overview: A Fast Introduction 5% Faster? 20% Faster? 300% Faster? 900% Faster?
UnifiedArchitecture - CUDA CUDA Overview: A Fast Introduction • Low Cost, Supercomputing for the Masses
Does it Worth? Speedups CUDA Overview: A Fast Introduction 1 Year 3 Days 1 Day 15 Minutes 2 Minutes 1.2 Seconds 100x
UnifiedArchitecture - CUDA CUDA Overview: A Fast Introduction
UnifiedArchitecture - CUDA CUDA Overview: A Fast Introduction • Low Cost, Supercomputing for the Masses
Example: CrowdSimulation CUDA Overview: A Fast Introduction 1.000.000 Bodies
Architecture CUDA Overview: A Fast Introduction • CPUs vs GPUs
GPU – TheEvolution CUDA Overview: A Fast Introduction FixedFunctionGPUs ProgrammableGPUs UnifiedArchitecture
GPU – TheEvolution CUDA Overview: A Fast Introduction FixedFunctionGPUs • NotProgrammableArchitecture • No Acess to theProcessor • OnlyAPIs
GPU – TheEvolution CUDA Overview: A Fast Introduction ProgrammableGPUs • ArchitectureOriented to ComputerGraphics
Unified Architecture - CUDA CUDA Overview: A Fast Introduction
Getting VS2008 for Free CUDA Overview: A Fast Introduction
VS2008 Integration CUDA Overview: A Fast Introduction Install VS2008
VS2008 Integration CUDA Overview: A Fast Introduction
VS2008 Integration CUDA Overview: A Fast Introduction
VS2008 Integration CUDA Overview: A Fast Introduction
VS2008 Integration CUDA Overview: A Fast Introduction
VS2008 Integration CUDA Overview: A Fast Introduction
VS2008 Integration CUDA Overview: A Fast Introduction
VS2008 Integration CUDA Overview: A Fast Introduction • Command line: • $(CUDA_BIN_PATH)\nvcc.exe -ccbin "$(VCInstallDir)bin" -c -D_DEBUG -DWIN32 -D_CONSOLE -D_MBCS -Xcompiler /EHsc,/W3,/nologo,/Od,/Zi,/RTC1,/MDd -I"$(CUDA_INC_PATH)" -I./ -o $(ConfigurationName)\kernel.obj kernel.cu • Outputs: • $(ConfigurationName)\kernel.obj
VS2008 Integration CUDA Overview: A Fast Introduction
CUDA VS Wizard CUDA Overview: A Fast Introduction
CUDA and Linux TurnoffCompiz Downgrade G++ and GCC From 4.3 to 4.1
CUDA and Linux CUDA Overview: A Fast Introduction
CUDA and Eclipse CUDA Overview: A Fast Introduction
Software Architecture CUDA Overview: A Fast Introduction
CUDA and Threads CUDA Overview: A Fast Introduction Why Programming in Threads? LoadBalancing SharetheLoadAmongProcessors Maximum use ofeachProcessor
CUDA and Threads CUDA Overview: A Fast Introduction Howmany threads haveyou Evercreated? CUDA Allowthousandsand Thousandsof threads = Cluster of Threads
Threads – Management Costs CUDA Overview: A Fast Introduction CPU Few Threads GPU IfweNeed 1000 inst. to change Threads, it’s ok. Thounsads Threads 1000 instIt’s NOT ok.
Cuda - Synchronization CUDA Overview: A Fast Introduction MustbeExplicit “…synchronization is accomplished using the function syncthreads, which acts as a barrier or memory fence…”
Cuda – ImportantDefinitions CUDA Overview: A Fast Introduction Cuda extends the C Language through the kernels *.cu – CUDA Files Each Kernel is a function that will be executed N times on the device
Conventions CUDA Overview: A Fast Introduction Host Device
Functions in CUDA Executed Called Combinations are also Possible No recursionatthedevice (GPU) No staticvariables cudaMalloc() cudaFree()
CUDA andLimitsof Bandwidth of Memory CUDA Overview: A Fast Introduction Reuse your Data!
Architecture CUDA Overview: A Fast Introduction • Hide Implementation Details • HW Evolution
Threads, BlocksandGrids OneKernel OneGrid EachBlock Many Threads All Threads inside a blocksharethesame memory area Threads in differentblocks do notshare memory their local memory amongthem Threads in differentblockscannotcooperate
Threads, BlocksandGrids EachBlock up to 512 threads
Threads, BlocksandGrids CUDA Overview: A Fast Introduction • __ global__ void KernelFunction (...) • dim3 DimGrid (100, 10); // Grid 1000 Blocks • dim3 DimBlock (4, 8, 8); // Each block has 256 threads • Size_t SharedMemBytes = 32 • KernelFun << DimGrid, DimBlock, SharedMemBytes>> (...);
Some code... CUDA Overview: A Fast Introduction // Kernel definition __global__ void vecAdd(float* A, float* B, float* C){...} int main(){ // Kernel invocation vecAdd<<<1, N>>>(A, B, C); } __global defines that it’s a kernel… CalledonThe Host ExecutedonTheDevice