Shankar Krishnan

Techniques for Data Storage and Transfer in the Graphics Pipeline Shankar Krishnan AT&T Labs - Research

Vertex Connectivity Transformed Vertices Fragments Vertices Primitive Assembly and Rasterization Fragment Texturing and Coloring Vertex Transformation Colored Fragments Pixel Positions Pixel Updates Raster Operations Graphics Hardware Pipeline Courtesy: The Cg Tutorial [Fernando and Kilgard]

Data Movement • A big issue • Still forms the main bottleneck in rendering • In-spite of off-chip bandwidth increases • Efficiency • Move data faster • Wider interfaces • Move data less • Optimize locality • Pipelined architecture • Caching

Rasterization and Interpolation Raster Operations Programmable Graphics Pipeline 3D API Commands 3D API: OpenGL or Direct3D 3D Application Or Game CPU-GPU Boundary GPU Command & Data Stream Vertex Index Stream Pixel Location Stream Assembled Primitives Pixel Updates GPU Front End Primitive Assembly Frame Buffer Transformed Vertices Transformed Fragments Pre-transformed Vertices Pre-transformed Fragments Programmable Fragment Processor Programmable Vertex Processor Courtesy: Cg Book [Fernando and Kilgard]

Classic Immediate Mode • Convenient • Per-vertex, per-primitive • Matches client’s view of data • Very slow glColor3f(1, 0, 0); glBegin(GL_TRIANGLES); glVertex3f(…); … glEnd();

Vertex Arrays • Data packed into an array • Format specified by type and organization • glVertexPointer(size, type, stride, data_ptr) • Stays on the client side • Reduces function call overhead • Client side data is too far from pipeline

Possible Solutions • Data resides on server side • Allow the pipeline to interpret data • Applications • GPU-based simulation • Cloth simulation • N-body problems • Graph drawing

Possible Solutions - II • Four possible ways to accomplish it • VAR / PDR • Now obsolete • VBO / PBO • Uses new vertex and pixel buffer object extensions • Works on all NV3x architectures • Very efficient • Vertex texture (NV_vertex_program3) • Only works on GeForce 6 series • Superbuffers (yet to be released)

What is VAR/PDR? • Makes vertex arrays work better • Compiled vertex arrays • Relaxes coherency requirements • Multipass rendering • VAR allows the GPU to pull vertex data (via DMA) • Requires AGP or video memory

Using VAR • Create video memory data = wglAllocateMemoryNV(size, .1, .1, 1); • Enable memory glVertexArrayRangeNV(size, data); glEnableClientState(GL_VERTEX_ARRAY_RANGE_NV); • Copy data and set memory pointers memcpy(data, app_array, size); glEnableClientState(GL_VERTEX_ARRAY); glVertexPointer(3, GL_FLOAT, 0, data); • Draw primitives glDrawArrays(data, 0, count);

Multipass Methods • Pipelined structure of computation forces multiple passes • Need to have access to data from previous passes • Vertex and image data • How does the pipeline have access to this information?

Getting data back I: Readbacks 3D API: OpenGL or Direct3D Rasterization and Interpolation GPU Front End Primitive Assembly Frame Buffer Raster Operations Fragment Processor Vertex Processor • Readbacks transfer data from the frame buffer to the CPU. • They are very general (any buffer can be transferred) • Partial buffers can be transferred • They are slow: reverse data transfer across PCI/AGP bus is very slow (PCIe is expected to be a lot better)

Getting data back II: Copy-to-texture Rasterization and Interpolation GPU Front End Primitive Assembly Frame Buffer Raster Operations Fragment Processor Vertex Processor • Not very flexible: depth and stencil buffers cannot be transferred in this way, and copy to texture is still somewhat slow. • Loss of precision in the copy. • Copy-to-texture transfers data from frame buffer to texture. • Transfer does not cross GPU-CPU boundary. • Partial buffers can be transferred

Getting data back III: Render-to-texture Rasterization and Interpolation GPU Front End Primitive Assembly Raster Operations Fragment Processor Vertex Processor • Only works with depth and color buffers (not stencil). • Render-to-texture renders directly into a texture. • Transfer does not cross GPU-CPU boundary. • Fastest way to transfer data to fragment processor Render-to-texture is the best method for reading data back after a computation.

Using Render-to-texture • Successor of off-screen rendering using pbuffers • pbuffers - own rendering context like frame buffers • Using the render-texture extension is tricky • You have to set up a pbuffer context, activate it, and then render to this context • Once deactivated, you can read from the bound texture • You cannot write to a texture and read it simultaneously • Mark Harris (NVIDIA) has written a RenderTexture class that wraps all of this

Pixel Data Range - PDR • For image intensive applications • Once video memory is allocated, create pixel data range glPixelDataRangeNV(GL_READ_PIXEL_DATA_RANGE_NV, size, data); • During readbacks glClientEnableState(GL_READ_PIXEL_DATA_RANGE_NV); // bind read buffer / rendertexture glReadPixels(0, 0, width, height, GL_RGB, GL_FLOAT, data);

Demo – Graph Drawing • Graph layouts as an n-body system • Assume connected graph • All pairs of vertices experience a replusive force, edges contribute attractive force • Positions are stored in floating-point texture • Updated using fragment programs • Render to texture each time step • Start with random initial positions • VAR/PDR to feed positions back to the fragment program

VAR Benefits • Easy to mix and match data • Share vertex array with multiple index arrays • Implements LODs • Memory-mapped • Fine-grained control of updates • No extra copies • Well-suited for immediate mode rendering

Problems with VAR • Breaks server/client paradigm • No internal memory management • Developer should specify type of memory used (AGP, system or video) • Memory management goes through a semaphore/fence like system

Byte memory + state Application Server Client Server-Side Buffer Objects • Byte memory (mappable) • Resident server side • Efficient rendering • sharable

Buffer Objects • Overcomes most of the problems list with VAR • Encapsulates data within “buffer objects” • Provides an interface to read/write to buffers directly or via “mapping” • Automatically figures out where to allocate memory depending on usage • STATIC, DYNAMIC, STREAM, READ, WRITE, COPY • Avoids unnecessary data copies

Modifying Buffer Object Data • Functional • glBufferSubDataARB(target, offset, size, data); • glGetBufferSubDataARB(target, offset, size, data); • Always safe • Memory mapped • glMapBufferARB(target, access); • glUnmapBufferARB(target); • May not necessarily be faster • Could result in loss of data

Vertex Arrays using VBOs // create system memory and fill data data = malloc(size); memcpy(data, …); // create buffer object GLuint buffer; glGenBuffersARB(1, &buffer); glBindBufferARB(GL_ARRAY_BUFFER_ARB, buffer); // Initialize data of buffer object glBufferDataARB(GL_ARRAY_BUFFER_ARB, size, data, GL_STATIC_DRAW_ARB); // rendering loop While (1) { glBindBufferARB(GL_ARRAY_BUFFER_ARB, buffer); glVertexPointer(4, GL_FLOAT, 0, (char *)NULL); glEnableClientState(GL_VERTEX_ARRAY); glDrawArrays(…); glDisableClientState(GL_VERTEX_ARRAY); } // Delete buffer object glDeleteBuffersARB(1, &buffer);

VBO Targets • GL_DRAW_ARRAY_ARB • Contains vertex attributes like positions, normals, per-vertex color etc. • Data can be organized interleaved or packed together (controlled using stride parameter) • GL_ELEMENT_ARRAY_ARB • Contains element pointer • Stores only indices of elements • Can be used together to implement LODs

Pixel Buffer object (PBO) • Extends the functionality of VBOs • Same buffer object can be bound to both vertex arrays and pixel commands • Adds two more targets • GL_PIXEL_PACK_BUFFER_ARB • glReadPixels() writes into buffer object • GL_PIXEL_UNPACK_BUFFER_ARB • glDrawPixels(), glTexImage2D() read from object • Can be used to render to vertex array

PBO Test • Courtesy of nVidia • Allows developers to experiment with various combinations of texture transfer • Image upload to GPU • Readback • Internal and external image formats

Render to Vertex Array int nverts = 100; // 4 floats per vertex // create buffer object GLuint buffer; glGenBuffersARB(1, &buffer); glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, buffer); // Initialize data of buffer object glBufferDataARB(GL_PIXEL_PACK_BUFFER_ARB, nverts*4, NULL, GL_DYNAMIC_DRAW_ARB); // render vertex data into a 100 X 1 piece of framebuffer using fragment program cgGLBindProgram(fragProg); cgGLEnableProfile(…); glDrawBuffer(GL_BACK); renderVertexData(…); cgGLDisableProfile(…); // read vertex data from the framebuffer glReadBuffer(GL_BACK); glReadPixels(0, 0, nverts, 1, GL_RGBA, GL_FLOAT, (char *)NULL); // change the binding of buffer object glBindBufferARB(GL_ARRAY_BUFFER_ARB, buffer); glEnableClientState(GL_VERTEX_ARRAY); glVertexPointer(4, GL_FLOAT, 0, (char *)NULL); glDrawArrays(…, nverts);

Render to Vertex Array Rasterization and Interpolation Raster Operations GPU Front End Primitive Assembly Programmable Fragment Processor Programmable Vertex Processor VBO/PBO texture

Demo – Particle System • Use of VBO/PBO in performing render to vertex array • Another particle system • Position and velocity stored as float buffers • Has ability to provide real-time collision detection by introducing obstacles • Million particle system at 20fps on GeForce 6800

Vertex Programs Accessing Textures • So far, we could get by without the need for the vertex program to have access to textures • Specially intermediate results • Consider displacement mapping to simulate waves in the ocean • Each pass updates a displacement map • Vertex positions are updated accordingly • Can be done with dynamic bump maps • Texture access greatly improves the performance • Vertex skinning etc.

Vertex Textures • Available only on the GeForce 6 series • Part of the NV_vertex_program3 extension GLuint vertex_texture; glGenTextures(1, &vertex_texture); glBindTexture(GL_TEXTURE_2D, vertex_texture); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST_MIPMAP_NEAREST); glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA_FLOAT32_ATI, width, height, 0, GL_RGBA, GL_FLOAT, data); • No automatic mipmap computation • Used for texture lookup in vertex program

Vertex Textures Rasterization and Interpolation Raster Operations GPU Front End Primitive Assembly Programmable Fragment Processor Programmable Vertex Processor texture

CPU Vertex program Fragment program Frame buffer CPU Vertex program Fragment program Frame buffer Summary of memory flow Readback Copy-to-Texture CPU Vertex program Fragment program Render-to-Texture

Summary of memory flow Vertex program Fragment program VBO/PBO transfer Vertex program Fragment program nv40 texture reference in vertex prog.

Questions?

Shankar Krishnan

Shankar Krishnan

Presentation Transcript

Building Effective Communication. By, Shankar Silmula.

Presented by: Shankar Bhargav

Gauri Shankar Rudraksha

Presented by Ramaswamy Krishnan-Chittur

By Jay Krishnan

PRESENTED BY, SHANKAR KUMAR

Fan Bai, Hariharan Krishnan, Varsha Sadekar

Renjith Krishnan

Manoj Krishnan Pacific Northwest National Laboratory

Bhawani Shankar Senior Analyst GartnerGroup-Dataquest

Biswabandan Panda , Shankar Balachandran { biswa,shankar }@cse.iitm.ac

Shankar Mahadavan Hit Songs

Shankar Dhabha and Jawala ji Dhaba

Hari Shankar Tibrewala

Fan Bai, Hariharan Krishnan, Varsha Sadekar

5 Mukhi Gauri Shankar Rudraksha