1 / 25

CUDA Lecture 2 History of GPUs

CUDA Lecture 2 History of GPUs. Prepared 5/24/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron. Graphics in a Nutshell. Make great images Intricate shapes Complex optical effects Seamless motion Make them fast Invent clever techniques Use every trick imaginable

saniya
Download Presentation

CUDA Lecture 2 History of GPUs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CUDA Lecture 2History of GPUs Prepared 5/24/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

  2. Graphics in a Nutshell • Make great images • Intricate shapes • Complex optical effects • Seamless motion • Make them fast • Invent clever techniques • Use every trick imaginable • Build monster hardware Eugene d’Eon, David Luebke, Eric Enderton, In Proc. EGSR 2007 and GPU Gems 3 History of GPUs – Slide 2

  3. Vertex Transform & Lighting Triangle Setup & Rasterization Texturing & Pixel Shading Depth Test & Blending Framebuffer The Graphics Pipeline History of GPUs – Slide 3

  4. Vertex Transform & Lighting Triangle Setup & Rasterization Texturing & Pixel Shading Depth Test & Blending Framebuffer The Graphics Pipeline History of GPUs – Slide 4

  5. Vertex Transform & Lighting Triangle Setup & Rasterization Texturing & Pixel Shading Depth Test & Blending Framebuffer The Graphics Pipeline • Transform from “world space” to “image space” • Compute per-vertex lighting History of GPUs – Slide 5

  6. Vertex Transform & Lighting Triangle Setup & Rasterization Texturing & Pixel Shading Depth Test & Blending Framebuffer The Graphics Pipeline • Convert geometric representation (vertex) to image representation (fragment) • Interpolate per-vertex quantities across pixels History of GPUs – Slide 6

  7. Vertex Transform & Lighting Triangle Setup & Rasterization Texturing & Pixel Shading Depth Test & Blending Framebuffer The Graphics Pipeline History of GPUs – Slide 7

  8. Vertex Rasterize Pixel Test & Blend Framebuffer The Graphics Pipeline • Key abstraction of real-time graphics • Hardware used to look like this • One chip/board per stage • Fixed data flow through pipeline History of GPUs – Slide 8

  9. Vertex Rasterize Pixel Test & Blend Framebuffer The Graphics Pipeline • Everything fixed function with a certain number of modes • Number of modes for each stage grew over time • Hard to optimize hardware • Developers always wanted more flexibility History of GPUs – Slide 9

  10. Vertex Rasterize Pixel Test & Blend Framebuffer The Graphics Pipeline • Remains a key abstraction • Hardware used to look like this • Vertex and pixel processing became programmable, new stages added • GPU architecture increasingly centers around shader execution History of GPUs – Slide 10

  11. Vertex Rasterize Pixel Test & Blend Framebuffer The Graphics Pipeline • Exposing an (at first limited) instruction set for some stages • Limited instructions and instruction types and no control flow at first • Expanded to full ISA History of GPUs – Slide 11

  12. Why GPUs Scale So Nicely • Workload and programming model provide lots of parallelism • Applications provide large groups of vertices at once • Vertices can be processed in parallel • Apply same transform to all vertices • Triangles contain many pixels • Pixels from a triangle can be processed in parallel • Apply same shader to all pixels • Very efficient hardware to hide serialization bottlenecks History of GPUs – Slide 12

  13. With Moore’s Law… Pixel 0 Pixel 1 Blend Pixel Pixel 2 Blend Pixel 3 Vrtx 1 Vrtx 2 Vertex Vrtx0 Vertex Raster Raster History of GPUs – Slide 13

  14. More Efficiency • Note that we do the same thing for lots of pixels/vertices • A warp = 32 threads launched together • Usually execute together as well Control Control Control Control Control Control ALU ALU ALU ALU ALU ALU Control ALU ALU ALU ALU ALU ALU History of GPUs – Slide 14

  15. What Is (Historical) GPGPU? • All this performance attracted developers • To use GPUs, re-expressed their algorithms as general purpose computations using GPUs and graphics API in applications other than 3-D graphics • Pretend to be graphics; disguise data as textures or geometry, disguise algorithm as render passes • Fool graphics pipeline to do computation to take advantage of massive parallelism of GPU • GPU accelerates critical path of application History of GPUs – Slide 15

  16. General Purpose GPUs (GPGPUs) • Data parallel algorithms leverage GPU attributes • Large data arrays, streaming throughput • Fine-grain SIMD parallelism • Low-latency floating point (FP) computation • Applications – see http://GPGPU.org • Game effects (FX) physics, image processing • Physical modeling, computational engineering, matrix algebra, convolution, correlation, sorting History of GPUs – Slide 16

  17. Previous GPGPU Constraints • Dealing with graphics API • Working with the corner cases of the graphics API • Addressing modes • Limited texture size/dimension • Shader capabilities • Limited outputs • Instruction sets • Lack of integer & bit ops • Communication limited • Between pixels • Scatter a[i] = p per thread per Shader per Context Input Registers Fragment Program Texture Constants Temp Registers Output Registers FB Memory History of GPUs – Slide 17

  18. Summary: Early GPGPUs • To use GPUs, re-expressed algorithms as graphics computations • Very tedious, limited usability • Still had some very nice results • This was the lead up to CUDA History of GPUs – Slide 18

  19. Compute Unified Device Architecture (CUDA) • General purpose programming model • User kicks off batches of threads on the GPU • GPU = dedicated super-threaded, massively data parallel co-processor • Targeted software stack • Compute oriented drivers, language, and tools History of GPUs – Slide 19

  20. Compute Unified Device Architecture (CUDA) • Driver for loading computation programs into GPU • Standalone Driver - Optimized for computation • Interface designed for compute – graphics-free API • Data sharing with OpenGL buffer objects • Guaranteed maximum download & readback speeds • Explicit GPU memory management History of GPUs – Slide 20

  21. Example of Physical Reality behind CUDA CPU (host) GPU w/ local DRAM (device) History of GPUs – Slide 21 21

  22. Parallel Computing on a GPU • 8-series GPUs deliver 25 to 200+ GFLOPSon compiled parallel C applications • Available in laptops, desktops, and clusters • GPU parallelism is doubling every year • Programming model scales transparently GeForce 8800 Tesla D870 History of GPUs – Slide 22

  23. Parallel Computing on a GPU • Programmable in C with CUDA tools • Multithreaded SPMD model uses application data parallelism and thread parallelism Tesla S870 History of GPUs – Slide 23

  24. Final Thoughts • GPUs evolve as hardware and software evolve • Five stage graphics pipelining • An example of GPGPU • Intro to CUDA History of GPUs – Slide 24

  25. End Credits • Reading: Chapter 2, “Programming Massively Parallel Processors” by Kirk and Hwu. • Based on original material from • The University of Illinois at Urbana-Champaign • David Kirk, Wen-mei W. Hwu • The University of Minnesota: Weijun Xiao • Stanford University: Jared Hoberock, David Tarjan • Revision history: last updated 5/24/2011. History of GPUs – Slide 25

More Related