1 / 35

Shankar Krishnan

Techniques for Data Storage and Transfer in the Graphics Pipeline. Shankar Krishnan. AT&T Labs - Research. Vertex Connectivity. Transformed Vertices. Fragments. Vertices. Primitive Assembly and Rasterization. Fragment Texturing and Coloring. Vertex Transformation. Colored

saskia
Download Presentation

Shankar Krishnan

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Techniques for Data Storage and Transfer in the Graphics Pipeline Shankar Krishnan AT&T Labs - Research

  2. Vertex Connectivity Transformed Vertices Fragments Vertices Primitive Assembly and Rasterization Fragment Texturing and Coloring Vertex Transformation Colored Fragments Pixel Positions Pixel Updates Raster Operations Graphics Hardware Pipeline Courtesy: The Cg Tutorial [Fernando and Kilgard]

  3. Data Movement • A big issue • Still forms the main bottleneck in rendering • In-spite of off-chip bandwidth increases • Efficiency • Move data faster • Wider interfaces • Move data less • Optimize locality • Pipelined architecture • Caching

  4. Rasterization and Interpolation Raster Operations Programmable Graphics Pipeline 3D API Commands 3D API: OpenGL or Direct3D 3D Application Or Game CPU-GPU Boundary GPU Command & Data Stream Vertex Index Stream Pixel Location Stream Assembled Primitives Pixel Updates GPU Front End Primitive Assembly Frame Buffer Transformed Vertices Transformed Fragments Pre-transformed Vertices Pre-transformed Fragments Programmable Fragment Processor Programmable Vertex Processor Courtesy: Cg Book [Fernando and Kilgard]

  5. Classic Immediate Mode • Convenient • Per-vertex, per-primitive • Matches client’s view of data • Very slow glColor3f(1, 0, 0); glBegin(GL_TRIANGLES); glVertex3f(…); … glEnd();

  6. Vertex Arrays • Data packed into an array • Format specified by type and organization • glVertexPointer(size, type, stride, data_ptr) • Stays on the client side • Reduces function call overhead • Client side data is too far from pipeline

  7. Possible Solutions • Data resides on server side • Allow the pipeline to interpret data • Applications • GPU-based simulation • Cloth simulation • N-body problems • Graph drawing

  8. Possible Solutions - II • Four possible ways to accomplish it • VAR / PDR • Now obsolete • VBO / PBO • Uses new vertex and pixel buffer object extensions • Works on all NV3x architectures • Very efficient • Vertex texture (NV_vertex_program3) • Only works on GeForce 6 series • Superbuffers (yet to be released)

  9. What is VAR/PDR? • Makes vertex arrays work better • Compiled vertex arrays • Relaxes coherency requirements • Multipass rendering • VAR allows the GPU to pull vertex data (via DMA) • Requires AGP or video memory

  10. Using VAR • Create video memory data = wglAllocateMemoryNV(size, .1, .1, 1); • Enable memory glVertexArrayRangeNV(size, data); glEnableClientState(GL_VERTEX_ARRAY_RANGE_NV); • Copy data and set memory pointers memcpy(data, app_array, size); glEnableClientState(GL_VERTEX_ARRAY); glVertexPointer(3, GL_FLOAT, 0, data); • Draw primitives glDrawArrays(data, 0, count);

  11. Multipass Methods • Pipelined structure of computation forces multiple passes • Need to have access to data from previous passes • Vertex and image data • How does the pipeline have access to this information?

  12. Getting data back I: Readbacks 3D API: OpenGL or Direct3D Rasterization and Interpolation GPU Front End Primitive Assembly Frame Buffer Raster Operations Fragment Processor Vertex Processor • Readbacks transfer data from the frame buffer to the CPU. • They are very general (any buffer can be transferred) • Partial buffers can be transferred • They are slow: reverse data transfer across PCI/AGP bus is very slow (PCIe is expected to be a lot better)

  13. Getting data back II: Copy-to-texture Rasterization and Interpolation GPU Front End Primitive Assembly Frame Buffer Raster Operations Fragment Processor Vertex Processor • Not very flexible: depth and stencil buffers cannot be transferred in this way, and copy to texture is still somewhat slow. • Loss of precision in the copy. • Copy-to-texture transfers data from frame buffer to texture. • Transfer does not cross GPU-CPU boundary. • Partial buffers can be transferred

  14. Getting data back III: Render-to-texture Rasterization and Interpolation GPU Front End Primitive Assembly Raster Operations Fragment Processor Vertex Processor • Only works with depth and color buffers (not stencil). • Render-to-texture renders directly into a texture. • Transfer does not cross GPU-CPU boundary. • Fastest way to transfer data to fragment processor Render-to-texture is the best method for reading data back after a computation.

  15. Using Render-to-texture • Successor of off-screen rendering using pbuffers • pbuffers - own rendering context like frame buffers • Using the render-texture extension is tricky • You have to set up a pbuffer context, activate it, and then render to this context • Once deactivated, you can read from the bound texture • You cannot write to a texture and read it simultaneously • Mark Harris (NVIDIA) has written a RenderTexture class that wraps all of this

  16. Pixel Data Range - PDR • For image intensive applications • Once video memory is allocated, create pixel data range glPixelDataRangeNV(GL_READ_PIXEL_DATA_RANGE_NV, size, data); • During readbacks glClientEnableState(GL_READ_PIXEL_DATA_RANGE_NV); // bind read buffer / rendertexture glReadPixels(0, 0, width, height, GL_RGB, GL_FLOAT, data);

  17. Demo – Graph Drawing • Graph layouts as an n-body system • Assume connected graph • All pairs of vertices experience a replusive force, edges contribute attractive force • Positions are stored in floating-point texture • Updated using fragment programs • Render to texture each time step • Start with random initial positions • VAR/PDR to feed positions back to the fragment program

  18. VAR Benefits • Easy to mix and match data • Share vertex array with multiple index arrays • Implements LODs • Memory-mapped • Fine-grained control of updates • No extra copies • Well-suited for immediate mode rendering

  19. Problems with VAR • Breaks server/client paradigm • No internal memory management • Developer should specify type of memory used (AGP, system or video) • Memory management goes through a semaphore/fence like system

  20. Byte memory + state Application Server Client Server-Side Buffer Objects • Byte memory (mappable) • Resident server side • Efficient rendering • sharable

  21. Buffer Objects • Overcomes most of the problems list with VAR • Encapsulates data within “buffer objects” • Provides an interface to read/write to buffers directly or via “mapping” • Automatically figures out where to allocate memory depending on usage • STATIC, DYNAMIC, STREAM, READ, WRITE, COPY • Avoids unnecessary data copies

  22. Modifying Buffer Object Data • Functional • glBufferSubDataARB(target, offset, size, data); • glGetBufferSubDataARB(target, offset, size, data); • Always safe • Memory mapped • glMapBufferARB(target, access); • glUnmapBufferARB(target); • May not necessarily be faster • Could result in loss of data

  23. Vertex Arrays using VBOs // create system memory and fill data data = malloc(size); memcpy(data, …); // create buffer object GLuint buffer; glGenBuffersARB(1, &buffer); glBindBufferARB(GL_ARRAY_BUFFER_ARB, buffer); // Initialize data of buffer object glBufferDataARB(GL_ARRAY_BUFFER_ARB, size, data, GL_STATIC_DRAW_ARB); // rendering loop While (1) { glBindBufferARB(GL_ARRAY_BUFFER_ARB, buffer); glVertexPointer(4, GL_FLOAT, 0, (char *)NULL); glEnableClientState(GL_VERTEX_ARRAY); glDrawArrays(…); glDisableClientState(GL_VERTEX_ARRAY); } // Delete buffer object glDeleteBuffersARB(1, &buffer);

  24. VBO Targets • GL_DRAW_ARRAY_ARB • Contains vertex attributes like positions, normals, per-vertex color etc. • Data can be organized interleaved or packed together (controlled using stride parameter) • GL_ELEMENT_ARRAY_ARB • Contains element pointer • Stores only indices of elements • Can be used together to implement LODs

  25. Pixel Buffer object (PBO) • Extends the functionality of VBOs • Same buffer object can be bound to both vertex arrays and pixel commands • Adds two more targets • GL_PIXEL_PACK_BUFFER_ARB • glReadPixels() writes into buffer object • GL_PIXEL_UNPACK_BUFFER_ARB • glDrawPixels(), glTexImage2D() read from object • Can be used to render to vertex array

  26. PBO Test • Courtesy of nVidia • Allows developers to experiment with various combinations of texture transfer • Image upload to GPU • Readback • Internal and external image formats

  27. Render to Vertex Array int nverts = 100; // 4 floats per vertex // create buffer object GLuint buffer; glGenBuffersARB(1, &buffer); glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, buffer); // Initialize data of buffer object glBufferDataARB(GL_PIXEL_PACK_BUFFER_ARB, nverts*4, NULL, GL_DYNAMIC_DRAW_ARB); // render vertex data into a 100 X 1 piece of framebuffer using fragment program cgGLBindProgram(fragProg); cgGLEnableProfile(…); glDrawBuffer(GL_BACK); renderVertexData(…); cgGLDisableProfile(…); // read vertex data from the framebuffer glReadBuffer(GL_BACK); glReadPixels(0, 0, nverts, 1, GL_RGBA, GL_FLOAT, (char *)NULL); // change the binding of buffer object glBindBufferARB(GL_ARRAY_BUFFER_ARB, buffer); glEnableClientState(GL_VERTEX_ARRAY); glVertexPointer(4, GL_FLOAT, 0, (char *)NULL); glDrawArrays(…, nverts);

  28. Render to Vertex Array Rasterization and Interpolation Raster Operations GPU Front End Primitive Assembly Programmable Fragment Processor Programmable Vertex Processor VBO/PBO texture

  29. Demo – Particle System • Use of VBO/PBO in performing render to vertex array • Another particle system • Position and velocity stored as float buffers • Has ability to provide real-time collision detection by introducing obstacles • Million particle system at 20fps on GeForce 6800

  30. Vertex Programs Accessing Textures • So far, we could get by without the need for the vertex program to have access to textures • Specially intermediate results • Consider displacement mapping to simulate waves in the ocean • Each pass updates a displacement map • Vertex positions are updated accordingly • Can be done with dynamic bump maps • Texture access greatly improves the performance • Vertex skinning etc.

  31. Vertex Textures • Available only on the GeForce 6 series • Part of the NV_vertex_program3 extension GLuint vertex_texture; glGenTextures(1, &vertex_texture); glBindTexture(GL_TEXTURE_2D, vertex_texture); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST_MIPMAP_NEAREST); glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA_FLOAT32_ATI, width, height, 0, GL_RGBA, GL_FLOAT, data); • No automatic mipmap computation • Used for texture lookup in vertex program

  32. Vertex Textures Rasterization and Interpolation Raster Operations GPU Front End Primitive Assembly Programmable Fragment Processor Programmable Vertex Processor texture

  33. CPU Vertex program Fragment program Frame buffer CPU Vertex program Fragment program Frame buffer Summary of memory flow Readback Copy-to-Texture CPU Vertex program Fragment program Render-to-Texture

  34. Summary of memory flow Vertex program Fragment program VBO/PBO transfer Vertex program Fragment program nv40 texture reference in vertex prog.

  35. Questions?

More Related