330 likes | 497 Views
Programmable Graphics Hardware CS 446: Real-Time Rendering & Game Technology. David Luebke University of Virginia. Recap: Advanced Texturing. Billboards Screen-aligned, world-aligned Point sprites Imposters Trees, buildings, portal textures, billboard clouds
E N D
Programmable Graphics HardwareCS 446: Real-Time Rendering & Game Technology David Luebke University of Virginia
Recap: Advanced Texturing • Billboards • Screen-aligned, world-aligned • Point sprites • Imposters • Trees, buildings, portal textures, billboard clouds • Dynamic imposters for “caching” rendering results • Depth textures • Multitexturing • Low-res light maps, hi-res decals, etc Real-Time Rendering
Textures: Other Important Stuff • Render to texture – framebuffer objects (FBOs) • Multiple render targets • Environment maps • Sphere map, cube maps (hardware supported) • Shadow maps • A depth texture rendered from light source (more later) • Relief textures • Demo now, details later Real-Time Rendering
Textures: Still More Stuff • Normal maps – especially for bump mapping • Gloss maps, reflectance maps, etc • Generally: • Think of textures as global memory for fragment programs, with built-in filtering • Just starting to be able to access textures in vertex programs too (NVIDIA hardware only, today) • Deferred shading • Projective texture mapping Real-Time Rendering
Next topic: Cg • Many of the techniques we discuss in this class do not depend on programmable graphics hardware • But even those are often easier to implement! • And programmable graphics opens up an endless number of tricks and techniques that could not have been efficiently implemented before • So, the next topic is a brief intro to Cg • My apologies to those of you who’ve seen this • My apologies to those of you who haven’t Real-Time Rendering
Acknowledgement & Aside • Much of this lecture comes from Bill Mark’s SIGGRAPH 2002 course talk on NVIDIA’s programmable graphics technology • For this reason, and because the lab is outfitted with NVIDIA cards, we will focus on NVIDIA tech • I try to mention similarities and differences with ATI, the other main GPU vendor, in lecture and slides • Note: many/most images are from NVIDIA as well Real-Time Rendering
The Graphics Pipeline • A simplified graphics pipeline • Note that pipe widths vary • Many caches, FIFOs, and so on not shown Graphics State CPU GPU Xformed, Lit Vertices (2D) Screenspace triangles (2D) Fragments (pre-pixels) Final Pixels (Color, Depth) Application Transform& Light AssemblePrimitives Rasterize Shade Vertices (3D) VideoMemory(Textures) Render-to-texture Real-Time Rendering
GPU Pipeline: Transform • Transform & light (a.k.a. vertex processor) • Transform from “world space” to “image space” • Compute per-vertex lighting Courtesy Mark Harris Real-Time Rendering
GPU Pipeline: Rasterize • Rasterizer • Convert geometric rep. (vertex) to image rep. (fragment) • Fragment = image fragment • Pixel + associated data: color, depth, stencil, etc. • Interpolate per-vertex quantities across pixels Courtesy Mark Harris Real-Time Rendering
GPU Pipeline: Shade • Fragment processors (multiple in parallel) • Compute a color for each pixel • Optionally read colors from textures (images) Courtesy Mark Harris
Programmable vertex processor! The ModernGraphics Pipeline Graphics State CPU GPU VertexProcessor FragmentProcessor Xformed, Lit Vertices (2D) Screenspace triangles (2D) Fragments (pre-pixels) Final Pixels (Color, Depth) Application Transform& Light AssemblePrimitives Rasterize Shade Vertices (3D) VideoMemory(Textures) Render-to-texture • Programmable pixel processor! Real-Time Rendering
Programmable primitive assembly! The Coming SoonGraphics Pipeline Graphics State CPU GPU GeometryProcessor Xformed, Lit Vertices (2D) Screenspace triangles (2D) Fragments (pre-pixels) Final Pixels (Color, Depth) Application VertexProcessor AssemblePrimitives Rasterize FragmentProcessor Vertices (3D) VideoMemory(Textures) Render-to-texture • More flexible memory access! Real-Time Rendering
Precision • 32-bit IEEE floating-point throughout pipeline • Framebuffer • Textures • Fragment processor • Vertex processor • Interpolants Real-Time Rendering
Multiple data types in hardware • Can support 32-bit IEEE floating point throughout pipeline • Vertices, interpolants, framebuffer, textures, computations • Fragment processor also supports: • 16-bit “half” floating point, 12-bit fixed point • These may be faster than 32-bit • Framebuffer/textures also support: • Large variety of fixed-point formats • E.g., classical 8-bit per component RGBA, BGRA, etc. • These formats use less memory bandwidth than FP32 Real-Time Rendering
Vertex processor capabilities • 4-vector FP32 operations • Condition codes + true data-dependent control flow • Conditional branches, subroutine calls, jump table • Useful for avoiding extra work, e.g.: • Don’t do animation, skinning if vertex will be clipped • Do displacement mapping only for vertices near silhouette • Transcendental arithmetic instructions (e.g. COS) • User clip-plane support • Texture reads (up to 4 textures, unlimited lookups) Real-Time Rendering
Vertex processor limitations • No arbitrary memory write • No “vertex kill” • Can put vertex off-screen • Can make degenerate primitives • Only 32-bit texture formats supported Real-Time Rendering
NV40-G70 vertex processor resources • 65535 instructions per program • Other statistics (NV30, not sure about NV40-G70): • 16 temporary 4-vector registers • 256 “uniform” parameter registers • 2 address registers (4-vector) • 6 clip-distance outputs Real-Time Rendering
Fragment processor: texture mapping • Texture reads are just another instruction • Allows computed texture coordinates, nested to arbitrary depth • This is a big difference w/ NVIDIA and ATI right now • Allows multiple uses of a single texture unit • Optional LOD control – can specify filter extent • Think of it as a memory-read instruction, with optional user-controlled filtering Real-Time Rendering
Fragment processor capabilities • Dynamic branching • Conditional fragment-kill instruction • Read access to window-space position • Read/write access to fragment Z (but not stencil) • Multiple render targets • Built-in derivative instructions • Partial derivatives w.r.t. screen-space x or y • Useful for anti-aliasing shaders • FP32, FP16, and fixed-point data Real-Time Rendering
Fragment processor limitations • Dynamic branching less efficient than vertex proc. • Especially for non-coherent branching (<~ 30x30 pixels) • Can do a lot with condition codes • No indexed reads from registers • I.e., no indexed arrays • Must use texture reads instead • No arbitrary memory write Real-Time Rendering
Fragment processor resources • 65535+ instructions • Nearly unlimited constants • Each constant counts as one instruction • 16 texture units (NV30, still?), reuse as often as desired • 10 FP32 x 4 perspective-correct inputs (e.g. tex coords) • Up to 4 128-bit framebuffer “color” outputs • Can pack as 4 x FP32, 8 x FP16, etc…) • Can also set the depth output • 24 or 32 bits, depending on stencil • Changing depth in fragment program may disable Z-optimizations Real-Time Rendering
GPU vendor differences • Note: this slide will be dated almost instantly • NVIDIA: as described in previous slides • ATI hardware today (1900XT current high-end part): • No vertex texture fetch (but good render-to-vertex-array) • Far fewer levels of computed texture coordinates • Better at fine-grained (less coherent) dynamic branching • ATI Xenos (Xbox 360 chip): • Unified shader model: vertex proc == pixel proc • Scatter support: shaders can write arbitrary memory loc Real-Time Rendering
Cg – “C for Graphics” • Cg is a high-level GPU programming language • Designed by NVIDIA and Microsoft • Competes with the (quite similar) GL Shading Language, a.k.a GLslang Real-Time Rendering
Programming in assembly is painful Assembly Cg …FRC R2.y, C11.w; ADD R3.x, C11.w, -R2.y; MOV H4.y, R2.y; ADD H4.x, -H4.y, C4.w; MUL R3.xy, R3.xyww, C11.xyww; ADD R3.xy, R3.xyww, C11.z; TEX H5, R3, TEX2, 2D; ADD R3.x, R3.x, C11.x; TEX H6, R3, TEX2, 2D;… … L2weight = timeval – floor(timeval); L1weight = 1.0 – L2weight; ocoord1 = floor(timeval)/64.0 + 1.0/128.0; ocoord2 = ocoord1 + 1.0/64.0; L1offset = f2tex2D(tex2, float2(ocoord1, 1.0/128.0)); L2offset = f2tex2D(tex2, float2(ocoord2, 1.0/128.0)); … • Easier to read and modify • Cross-platform • Combine pieces • etc. Real-Time Rendering
Some points in the design space • CPU languages • C – close to the hardware; general purpose • C++, Java, lisp – require memory management • RenderMan – specialized for shading • Real-time shading languages • Stanford shading language • Creative Labs shading language Real-Time Rendering
Design strategy • Start with C (and a bit of C++) • Minimizes number of decisions • Gives you known mistakes instead of unknown ones • Allow subsetting of the language • Add features desired for GPU’s • To support GPU programming model • To enable high performance • Tweak to make it fit together well Real-Time Rendering
How are GPUs different from CPUs? • GPU is a stream processor • Multiple programmable processing units • Connected by data flows VertexProcessor FragmentProcessor FramebufferOperations Assembly &Rasterization Application Framebuffer Textures
Cg separates vertex & fragment programs VertexProcessor FragmentProcessor FramebufferOperations Assembly &Rasterization Application Framebuffer Textures Program Program Real-Time Rendering
Cg programs have two kinds of inputs • Varying inputs (streaming data) • e.g. normal vector – comes with each vertex • This is the default kind of input • Uniform inputs (a.k.a. graphics state) • e.g. modelview matrix • Note: Outputs are always varying vout MyVertexProgram( float4 normal,uniform float4x4 modelview) { …
Binding VP outputs to FP inputs • Let compiler do it • Define a single structure • Use it for vertex-program output • Use it for fragment-program input struct vout { float4 color; float4 texcoord; … };
Binding VP outputs to FP inputs • Do it yourself • Specify register bindings for VP outputs • Specify register bindings for FP inputs • May introduce HW dependence • Necessary for mixing Cg with assembly struct vout { float4 color: TEX3; float4 texcoord: TEX5; … };
Some inputs and outputs are special • E.g. the position output from vert prog • This output drives the rasterizer • It must be marked struct vout { float4 color; float4 texcoord; float4 position : HPOS; };