1 / 63

Which plane is better?

Harnessing Massively Parallel Processors http://www.ece.ubc.ca/~matei/ Introduction to GPU Architecture and Programming Model. Acknowledgement: some slides borrowed from presentations by Kayvon Fatahalian, Mark Harris, Samer Al-Kiswany. YVR to Paris. Speed. Passengers. 10.5 hours. 610 mph.

dalmar
Download Presentation

Which plane is better?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Harnessing Massively Parallel Processorshttp://www.ece.ubc.ca/~matei/Introduction to GPU Architecture and Programming Model Acknowledgement: some slides borrowed from presentations by Kayvon Fatahalian, Mark Harris, Samer Al-Kiswany

  2. YVR to Paris Speed Passengers 10.5 hours 610 mph 470 5 hours 1350 mph 132 Which plane is better? Plane Boeing 747 Concorde

  3. Same idea for GPUs • Specialized for data-intensive highly parallel computations • (exactly what the graphics hardware does well) • More transistors allocated to processing data rather than to caching and control flow (compared to CPUs)

  4. Outline Hardware: GPU Architecture Intuition Software Programming Model Optimizations

  5. GPU Architecture Intuition

  6. GPU Architecture Intuition

  7. GPU Architecture Intuition

  8. GPU Architecture Intuition

  9. GPU Architecture Intuition

  10. GPU Architecture Intuition

  11. GPU Architecture Intuition

  12. Your data is not ready …

  13. Storing contexts

  14. (imagined)

  15. nVidia(still idealized but closer to reality) NVIDIA-terminology • 480 stream processors (“CUDA cores”) • (15 multi-processors) • SIMT execution

  16. NVIDIA GeForce GTX 480 (a multiprocessor) • A multiprocessor contains 32 cores • Two groups of threads (warps) are selected each clock (decode, fetch, execute two instruction streams in parallel) • Up to 48 warps are interleaved totalling 1536 CUDA threads CUDA ‘core’

  17. So far: ProcessigNext: Accessing data

  18. Summary so far Three major ideas (employed by all modern processors varying degrees) • Employ multiple processing cores • Simpler cores (embrace thread-level parallelism over ILP • Amortize instruction stream processing over cores (SIMD) • Increase compute capability with little extra cost • Use multi-threading to make more efficient use of processing resources (hide latencies, fill all available resources) Due to high arithmetic capability on modern chips, many parallel applications (on both CPUs and GPUs) are bandwidth bound GPUs push throughput computing concepts to extreme scales • Notable differences in memory system design

  19. Program Flowand Host-Level Issues

  20. GPU Architecture Multiprocessor 1 Shared Memory Instruction Unit Registers Registers Registers Processor 1 Processor 2 Processor M Host Machine Multiprocessor N GPU Multiprocessor 2 Host Constant Memory Texture Memory Global Memory

  21. SIMD Architecture. • Four memories. • Device (a.k.a. global) • slow – 400-600 cycles access latency • large – 256MB – 1GB • Shared • fast– 4 cycles access latency • small – 128KB • Texture – read only • Constant – read only

  22. GPU Architecture – Program Flow 1 2 4 5 1 2 3 4 5 TPreprocesing + TDataHtoG + TProcessing + TPostProc + TDataGtoH • Preprocessing • Data transfer in • GPU Processing • Data transfer out • Postprocessing 3 TTotal =

  23. ?

  24. Outline Hardware Software Programming Model Optimizations

  25. Add vectors

  26. GPU Programming Model Programming Model: Software representation of the Hardware

  27. GPU Programming Model Block Kernel: A function on the grid

  28. GPU Programming Model

  29. GPU Programming Model

More Related