250 likes | 410 Views
CUDA Lecture 2 History of GPUs. Prepared 5/24/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron. Graphics in a Nutshell. Make great images Intricate shapes Complex optical effects Seamless motion Make them fast Invent clever techniques Use every trick imaginable
E N D
CUDA Lecture 2History of GPUs Prepared 5/24/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Graphics in a Nutshell • Make great images • Intricate shapes • Complex optical effects • Seamless motion • Make them fast • Invent clever techniques • Use every trick imaginable • Build monster hardware Eugene d’Eon, David Luebke, Eric Enderton, In Proc. EGSR 2007 and GPU Gems 3 History of GPUs – Slide 2
Vertex Transform & Lighting Triangle Setup & Rasterization Texturing & Pixel Shading Depth Test & Blending Framebuffer The Graphics Pipeline History of GPUs – Slide 3
Vertex Transform & Lighting Triangle Setup & Rasterization Texturing & Pixel Shading Depth Test & Blending Framebuffer The Graphics Pipeline History of GPUs – Slide 4
Vertex Transform & Lighting Triangle Setup & Rasterization Texturing & Pixel Shading Depth Test & Blending Framebuffer The Graphics Pipeline • Transform from “world space” to “image space” • Compute per-vertex lighting History of GPUs – Slide 5
Vertex Transform & Lighting Triangle Setup & Rasterization Texturing & Pixel Shading Depth Test & Blending Framebuffer The Graphics Pipeline • Convert geometric representation (vertex) to image representation (fragment) • Interpolate per-vertex quantities across pixels History of GPUs – Slide 6
Vertex Transform & Lighting Triangle Setup & Rasterization Texturing & Pixel Shading Depth Test & Blending Framebuffer The Graphics Pipeline History of GPUs – Slide 7
Vertex Rasterize Pixel Test & Blend Framebuffer The Graphics Pipeline • Key abstraction of real-time graphics • Hardware used to look like this • One chip/board per stage • Fixed data flow through pipeline History of GPUs – Slide 8
Vertex Rasterize Pixel Test & Blend Framebuffer The Graphics Pipeline • Everything fixed function with a certain number of modes • Number of modes for each stage grew over time • Hard to optimize hardware • Developers always wanted more flexibility History of GPUs – Slide 9
Vertex Rasterize Pixel Test & Blend Framebuffer The Graphics Pipeline • Remains a key abstraction • Hardware used to look like this • Vertex and pixel processing became programmable, new stages added • GPU architecture increasingly centers around shader execution History of GPUs – Slide 10
Vertex Rasterize Pixel Test & Blend Framebuffer The Graphics Pipeline • Exposing an (at first limited) instruction set for some stages • Limited instructions and instruction types and no control flow at first • Expanded to full ISA History of GPUs – Slide 11
Why GPUs Scale So Nicely • Workload and programming model provide lots of parallelism • Applications provide large groups of vertices at once • Vertices can be processed in parallel • Apply same transform to all vertices • Triangles contain many pixels • Pixels from a triangle can be processed in parallel • Apply same shader to all pixels • Very efficient hardware to hide serialization bottlenecks History of GPUs – Slide 12
With Moore’s Law… Pixel 0 Pixel 1 Blend Pixel Pixel 2 Blend Pixel 3 Vrtx 1 Vrtx 2 Vertex Vrtx0 Vertex Raster Raster History of GPUs – Slide 13
More Efficiency • Note that we do the same thing for lots of pixels/vertices • A warp = 32 threads launched together • Usually execute together as well Control Control Control Control Control Control ALU ALU ALU ALU ALU ALU Control ALU ALU ALU ALU ALU ALU History of GPUs – Slide 14
What Is (Historical) GPGPU? • All this performance attracted developers • To use GPUs, re-expressed their algorithms as general purpose computations using GPUs and graphics API in applications other than 3-D graphics • Pretend to be graphics; disguise data as textures or geometry, disguise algorithm as render passes • Fool graphics pipeline to do computation to take advantage of massive parallelism of GPU • GPU accelerates critical path of application History of GPUs – Slide 15
General Purpose GPUs (GPGPUs) • Data parallel algorithms leverage GPU attributes • Large data arrays, streaming throughput • Fine-grain SIMD parallelism • Low-latency floating point (FP) computation • Applications – see http://GPGPU.org • Game effects (FX) physics, image processing • Physical modeling, computational engineering, matrix algebra, convolution, correlation, sorting History of GPUs – Slide 16
Previous GPGPU Constraints • Dealing with graphics API • Working with the corner cases of the graphics API • Addressing modes • Limited texture size/dimension • Shader capabilities • Limited outputs • Instruction sets • Lack of integer & bit ops • Communication limited • Between pixels • Scatter a[i] = p per thread per Shader per Context Input Registers Fragment Program Texture Constants Temp Registers Output Registers FB Memory History of GPUs – Slide 17
Summary: Early GPGPUs • To use GPUs, re-expressed algorithms as graphics computations • Very tedious, limited usability • Still had some very nice results • This was the lead up to CUDA History of GPUs – Slide 18
Compute Unified Device Architecture (CUDA) • General purpose programming model • User kicks off batches of threads on the GPU • GPU = dedicated super-threaded, massively data parallel co-processor • Targeted software stack • Compute oriented drivers, language, and tools History of GPUs – Slide 19
Compute Unified Device Architecture (CUDA) • Driver for loading computation programs into GPU • Standalone Driver - Optimized for computation • Interface designed for compute – graphics-free API • Data sharing with OpenGL buffer objects • Guaranteed maximum download & readback speeds • Explicit GPU memory management History of GPUs – Slide 20
Example of Physical Reality behind CUDA CPU (host) GPU w/ local DRAM (device) History of GPUs – Slide 21 21
Parallel Computing on a GPU • 8-series GPUs deliver 25 to 200+ GFLOPSon compiled parallel C applications • Available in laptops, desktops, and clusters • GPU parallelism is doubling every year • Programming model scales transparently GeForce 8800 Tesla D870 History of GPUs – Slide 22
Parallel Computing on a GPU • Programmable in C with CUDA tools • Multithreaded SPMD model uses application data parallelism and thread parallelism Tesla S870 History of GPUs – Slide 23
Final Thoughts • GPUs evolve as hardware and software evolve • Five stage graphics pipelining • An example of GPGPU • Intro to CUDA History of GPUs – Slide 24
End Credits • Reading: Chapter 2, “Programming Massively Parallel Processors” by Kirk and Hwu. • Based on original material from • The University of Illinois at Urbana-Champaign • David Kirk, Wen-mei W. Hwu • The University of Minnesota: Weijun Xiao • Stanford University: Jared Hoberock, David Tarjan • Revision history: last updated 5/24/2011. History of GPUs – Slide 25