120 likes | 283 Views
Exploiting Parallelism on GPUs. Matt Mukerjee David Naylor. Parallelism on GPUs. $100 NVIDIA video card 192 cores (Build Blacklight for ~$2000 ???) Incredibly low power Ubiquitous Question: Use for general computation? General Purpose GPU (GPGPU). ?. =. GPU Hardware.
E N D
Exploiting Parallelism on GPUs Matt Mukerjee David Naylor
Parallelism on GPUs • $100 NVIDIA video card 192 cores • (Build Blacklight for ~$2000 ???) • Incredibly low power • Ubiquitous • Question: Use for general computation? • General Purpose GPU (GPGPU) ? =
GPU Hardware • Very specific constraints • Designed to be SIMD (e.g. shaders) • Zero-overhead thread scheduling • Little caching (compared to CPUs) • Constantly stalled on memory access • MASSIVE # of threads / core • Much finer-grained threads (“kernels”)
Thread Blocks • GPUs are SIMD • How does multithreading work? • Threads that branch are halted, then run • Single Instruction Multiple….?
CUDA is an SIMT architecture • Single Instruction Multiple Thread • Threads in a block execute the same instruction Multi-threaded Instruction Unit
Observation Fitting the data structures needed by the threads in one multiprocessor requires application-specific tuning.
Example: MapReduce on CUDA Too big for cache on one SM!
Problem Only one code branchwithin a block executes at a time
Problem If two multiprocessors share a cache line, there are more memory accesses than necessary.