1 / 12

GPU

GPU. Precision, Power, Programmability CPU: x60/decade, 6 GFLOPS, 6GB/sec GPU: x1000/decade, 20 GFLOPs, 25GB/sec Arithmetic heavy (read OR write): faster hardware Parallelization Multi-billion $ entertainment market drives innovation 32-bit Floating point

smilliman
Download Presentation

GPU

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GPU • Precision, Power, Programmability • CPU: x60/decade, 6 GFLOPS, 6GB/sec • GPU: x1000/decade, 20 GFLOPs, 25GB/sec • Arithmetic heavy (read OR write): faster hardware • Parallelization • Multi-billion $ entertainment market drives innovation • 32-bit Floating point • Programmable (graphics, physics, general purpose data-flow) • Can’t simply “port” CPU code to GPU David Luebke et al. GPGPU, SIGGRAPH 2004

  2. History of the 3D graphics industry • 60s: • Line drawings, hidden lines, parametric surfaces (B-splines…) • Automated drafting & machining for car, airplane, and ships manufacturers • 70’s: • Mainframes, Vector tubes (HP…) • Software: Solids, (CSG), Ray Tracing, Z-buffer for hidden lines • 80s: • Graphics workstations ($50K-$1M): Frame buffers, rasterizers , GL, Phigs • VR: CAVEs and head-mounted displays • CAD/CAM & GIS: CATIA, SDRC, PTC • Sun, HP, IBM, SGI, E&S, DEC • 90s: • PCs ($2K): Graphics boards, OpenGL, Java3D • CAD+Videogames+Animations: AutoCAD, SolidWorks…, Alias-Wavefront • Intel, many board vendors • 00s: • Laptops, PDAs, Cell Phones: Parallel graphic chips • Everything will be graphics, 3D, animated, interactive • Nvidia, Sony, Nokia

  3. History of GPU • Pre-GPU Graphics Acceleration • SGI, Evans & Sutherland. Introduced concepts like vertex transformation and texture mapping. Very expensive! • First-Generation GPU (-1998) • Nvidia TNT2, ATI Rage, Voodoo3. Vertex transformation on CPU, limited set of math operations. • Second-Generation GPU (1999-2000) • GeForce 256, Geforce2, Radeon 7500, Savage3D. Transformation & Lighting. More configurable, still not programmable. • Third-Generation GPU (2001) • Geforce3, Geforce4 Ti, Xbox, Radeon 8500. Vertex Programmability, pixel-level configurability. • Fourth-Generation GPU (2002-) • Geforce FX series, Radeon 9700 and on. Vertex-level and pixel-level programmability.

  4. Architecture Application Vertex Shader transformed vertices, normals, colors Geometry Shader Rasterizer fragments (surfels per pixel) texture Fragment Shader pixel color, depth, stencil Compositor Display

  5. Buffers • Color: 8-bit index to color table, float/16-bit true color… • Depth: 24-bit or float (0 at back plane) • Back and front: display front, update back, swap • Stereo: Shutter glasses, HMD. Alternate frames • Auxiliary: off-screen working space. Helps reduce passes. • Stencil: 8 bits (left-over of depth buffer). <,>… mask, ++ • Accumulation: sum, scale (supersampling, blur) • P-buffer, superbuffers: Render to texture

  6. Fragment operations • Depth tests: <, <=, >, <=, ==, Zdepth-interval • Stencil test: mask?, counter, parity. • Alpha tests: compare to reference alpha • Alpha blending: + max, min, replace, blend

  7. Data Parallelism in GPUs • Data flow: vertices > fragments > pixels • Parallelism at each stage • No shared or static data (except textures) • ALU-heavy (multiple ALUs per stage in pipe) • Fight memory latency with more computation

  8. GPGPU • Stream: collection of records (pixels, vertices…) • Stored in Textures (a computational grid) • Kernel: Function applied to each element in stream • Transform, evolve (no dependency between records) • Matrix algebra • Image/volume processing • Physical simulation • Global illumination • Ray tracing • Photon mapping • Radiosity

  9. Computational Resources • Programmable parallel processors • Vertex & Fragment pipelines • Rasterizer • Mostly useful for interpolating addresses (texture coordinates) and per-vertex constants • Texture unit • Read-only memory interface • Render to texture (or Copy to texture) • Write-only memory interface

  10. Vertex Processor • Fully programmable (SIMD / MIMD) • Processes 4-vectors (RGBA / XYZW) • Capable of scatter but not gather (A[i,j]=x;) • Can change the location of current vertex • Cannot read info from other vertices • Can only read a small constant memory • Vertex Texture Fetch • Random access memory for vertices • Arguably still not gather

  11. Fragment Processor • May be invoked at each pixel by drawing a full screen quad • Fully programmable (SIMD) • Processes 4-vectors (RGBA / XYZW) • Random access memory read (textures) • Capable of gather(x=A[i+1,j];) and some scatter • RAM read (texture), but no RAM write • Output address fixed to a specific pixel • But can change that address • Typically more useful than vertex processor • More fragment pipelines than vertex pipelines • Gather • Direct output (fragment processor is at end of pipeline)

  12. Branching • Not supported or expensive • Avoid, replace by math • Depth test • Stencil test • Occlusion query (conditional execution) • Pre-computation (region of interest, use to set stencil mask)

More Related