1 / 37

GPU Memory Model Overview

GPU Memory Model Overview. Aaron Lefohn University of California, Davis. Overview. GPU Memory Model GPU-Based Data Structures Introduction to Frame Buffer Objects. Memory Hierarchy. CPU and GPU Memory Hierarchy. Disk. CPU Main Memory. CPU Caches. GPU Video Memory. CPU Registers.

tim
Download Presentation

GPU Memory Model Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GPU Memory Model Overview Aaron Lefohn University of California, Davis

  2. Overview • GPU Memory Model • GPU-Based Data Structures • Introduction to Frame Buffer Objects

  3. Memory Hierarchy • CPU and GPU Memory Hierarchy Disk CPU Main Memory CPU Caches GPU Video Memory CPU Registers GPU Caches GPU Constant Registers GPU Temporary Registers

  4. CPU Memory Model • Random Memory Access at Any Program Point • Read/write to registers • Read/write to local (stack) memory • Read/write to global (heap) memory • Read/write to disk

  5. GPU Memory Model • Much more restricted memory access • GPU Kernels • Read/write to registers • No local stack memory • No disk access • Read-only global memory access v = a[i]; • Write to global memory at end of pass • Pre-computed memory addresses (no scatter) i = foo(a); a[i] = bar(b); • Write location set by fragment position a[fragPos] = bar(b);

  6. Vertex Processor Fragment Processor Rasterizer GPU Memory Model • Where is GPU Data Stored? • Vertex buffer • Texture • Frame buffer PS3.0 GPUs Texture Buffer Frame Buffer(s) Vertex Buffer

  7. Vertex Processor Fragment Processor Rasterizer Render-to-Texture • Idea • Write rendering result to texture memory • Enables GPU-based computational iterations PS3.0 GPUs Texture Data Vertex Data

  8. Render-to-Texture • OpenGL Support • Save up to 16, 32-bit floating values per pixel • Multiple Render Targets (MRTs) on ATI and NVIDIA • Copy-to-texture • glCopyTexSubImage • Render-to-texture • GL_EXT_framebuffer_object

  9. Vertex Processor Fragment Processor Rasterizer Render-to-Vertex-Array • Idea • Write rendering results to vertex array • Allows GPU to loop back to beginning of pipeline PS3.0 GPUs Texture Data Vertex Data

  10. Render-To-Vertex-Array • OpenGL Support • Copy-to-vertex-array • GL_ARB_pixel_buffer_object • NVIDIA and ATI • Render-to-vertex-array • Maybe someday as an extension to GL_EXT_framebuffer_object • Semantics Still Under Development…

  11. Vertex Processor Fragment Processor Rasterizer Fbuffer: Capturing Fragments • Idea • Save all fragment values instead of one per pixel • “Rasterization-Order FIFO Buffer” PS3.0 GPUs TextureData Frame Buffer(s) Vertex Data

  12. Fbuffer: Capturing Fragments • Details • Designed for multi-pass rendering with transparent geometry • Mark and Proudfoot, Graphics Hardware 2001 http://graphics.stanford.edu/projects/shading/pubs/hwws2001-fbuffer/ • New possibilities for GPGPU • Varying number of results per pixel • RTT and RTVA with an fbuffer • OpenGL Support • ATI Radeon 9800 and newer ATI GPUs • Still not exposed to user (ask for it!)

  13. Overview • GPU Memory Model • GPU-Based Data Structures • Introduction to Frame Buffer Objects

  14. GPU-Based Data Structures • Building Blocks • GPU memory addresses • Address Generation • Address Use • Pointers

  15. CPU Vertex Processor Fragment Processor Rasterizer GPU Memory Addresses • Where Are GPU Addresses Generated? • CPU Vertex stream or textures • Vertex processor Input stream, ALU ops or textures • Rasterizer Interpolation • Fragment processor Input stream, ALU ops or textures

  16. Fragment Processor Rasterizer CPU GPU Memory Addresses • Where Are Addresses Used? • Vertex textures (PS3.0 GPUs) • Fragment textures Texture Data Vertex Processor

  17. GPU Memory Addresses • Floating-Point Addressing • Normalized addresses [0,1] • GL_TEXTURE_1D, _2D, _3D, _CUBE • Non-Normalized addresses [0,N] • GL_TEXTURE_RECTANGLE • See Ian Buck’s warning about unaddressable texels

  18. 0 1 2 3 GPU Memory Addresses • Pointers • Store addresses in texture • Dependent texture read • Example: See Tim Purcell’s case study float2 addr = tex2D( addrTex, texCoord ); float2 data = tex2D( dataTex, addr ); Address Texture Data Texture 0 3 Data 1 3 Data 2 1 Data 1 3 Data

  19. GPU-Based Data Structures • Building Blocks • GPU memory addresses • Address Generation • Address Use • Pointers • Multi-dimensional arrays and structs • Sparse representations

  20. Easier GPU Data Structures? • This section describes the low-level details of implementing basic GPGPU data structures • See the “Glift” talk for a high-level view of GPU data structures • Glift is a GPU data structure template library • Now for the gory details…

  21. GPU Arrays • Large 1D Arrays • Current GPUs limit 1D array sizes to 2048 or 4096 • Pack into 2D memory • 1D-to-2D address translation

  22. GPU Arrays • 3D Arrays • Problem • GPUs do not have 3D frame buffers • No render-to-slice-of-3D-texture yet • Solutions • Stack of 2D slices • Multiple slices per 2D buffer

  23. GPU Arrays • Problems With 3D Arrays for GPGPU • Cannot read stack of 2D slices as 3D texture • Must know which slices are needed in advance • Visualization of 3D data difficult • Solutions • Flat 3D textures • Need render-to-slice-of-3D-texture • Maybe eventually with GL_EXT_framebuffer_object • Volume rendering of flattened 3D data • “Deferred Filtering: Rendering from Difficult Data Formats,” GPU Gems 2, Ch. 41, p. 667

  24. GPU Arrays • Higher Dimensional Arrays • Pack into 2D buffers • N-D to 2D address translation • Same problems as 3D arrays if data does not fit in a single 2D texture

  25. GPU Structures • Store each member in a different “array” (texture) • Update structs with Multiple-Render Targets (MRTs) struct Foo { struct Foo { float4 a; float4 a[N]; float4 b; float4 b[N]; }; }; Foo foo[N];

  26. GPU-Based Data Structures • Building Blocks • GPU memory addresses • Address Generation • Address Use • Pointers • Multi-dimensional arrays • Sparse representations

  27. Sparse Data Structures • Why Sparse Data Structures? • Reduce memory pressure • Reduce computational workload • Examples • Sparse matrices • Krueger et al., Siggraph 2003 • Bolz et al., Siggraph 2003 • Deformable implicit surfaces (sparse volumes/PDEs) • Lefohn et al., IEEE Visualization 2003 Premoze et al. Eurographics 2003

  28. Sparse Data Structures • Basic Idea • Pack “active” data elements into GPU memory • For more information • Linear algebra section in this course : Static structures • Adaptive grid case study in this course : Dynamic structures

  29. GPU Data Structures • Conclusions • Fundamental GPU memory primitive is a fixed-size 2D array • GPGPU needs more general memory model • Building and modifying complex GPU-based data structures is an open research topic…

  30. Overview • GPU Memory Model • GPU-Based Data Structures • Introduction to Frame Buffer Objects

  31. Introduction to Frame Buffer Objects • Where is the “Pbuffer Survival Guide”? • Gone!!! • Frame Buffer objects (FBO) recently replaced pbuffers • Frame Buffers enable simple, intuitive, fast render-to-texture in OpenGL • http://oss.sgi.com/projects/ogl-sample/registry/EXT/framebuffer_object.txt

  32. Frame Buffer Objects • What is an FBO? • A struct that holds pointers to memory objects • Each bound memory object can be a frame buffer rendering surface

  33. Frame Buffer Objects • Which memory can be bound to an FBO? • Textures • Render buffers • Depth, stencil, color • Traditional write-only frame buffer surfaces • Created with new API

  34. Frame Buffer Objects • FBO Tips and Tricks • Driver support still evolving • Try test in GPUBench to test your driver/GPU • Do not bind textures of different size or format to the same FBO • Attach/unattach textures is nearly as fast as glDrawBuffer if format/size is the same • Changing FBOs to use a different format/size is slower than keeping same format, but still faster than wglMakeCurrent/glXMakeCurrent

  35. Conclusions • GPU Memory Model Evolving • Writable GPU memory forms loop-back in an otherwise feed-forward streaming pipeline • Memory model will continue to evolve as GPUs become more general stream processors • GPGPU Data Structures • Basic memory primitive is limited-size, 2D texture • Use address translation to fit all array dimensions into 2D • See “Glift” talk for GPU data structure template library

  36. Acknowledgements • Nick Triantos, Craig Kolb, Cass Everitt, Chris Seitz, NVIDIA • Mark Segal, Rob Mace, Arcot Preetham, Evan Hart, ATI • Brian Budge, Ph.D. student at UCDavis and NVIDIA intern • The other GPGPU Siggraph 2005 course presenters • John Owens, Ph.D. advisor, Univ. of California Davis • Ross Whitaker, M.S. advisor, SCI Institute, Univ. of Utah • National Science Foundation Graduate Fellowship • Pixar Animation Studios, summer internships

  37. Questions? • Thank you! • Google “Lefohn GPU” • http://graphics.cs.ucdavis.edu/~lefohn/

More Related