Programmable Graphics Hardware

ProgrammableGraphics Hardware David Luebke 14/1/2014

Admin • Handout: Cg in two pages • OLS 002a lab • Impact on homework David Luebke 24/1/2014

Acknowledgement & Aside • The bulk of this lecture comes from slides from Bill Mark’s SIGGRAPH 2002 course talk on NVIDIA’s programmable graphics technology • For this reason, and because the lab is outfitted with GeForce 3 cards, we will focus on NVIDIA tech David Luebke 34/1/2014

Outline • Programmable graphics • NVIDIA’s next-generation technology: GeForceFX (code name NV30) • Programming programmable graphics • NVIDIA’s Cg language David Luebke 44/1/2014

GPU Programming Model CPU GPU VertexProcessor FragmentProcessor FramebufferOperations Assembly &Rasterization Application Framebuffer Textures David Luebke 54/1/2014

32-bit IEEE floating-pointthroughout pipeline • Framebuffer • Textures • Fragment processor • Vertex processor • Interpolants David Luebke 64/1/2014

Hardware supports several other data types • Fragment processor also supports: • 16-bit “half” floating point • 12-bit fixed point • These may be faster than 32-bit on some HW • Framebuffer/textures also support: • Large variety of fixed-point formats • E.g., classical 8-bit per component • These formats use less memory bandwidth than FP32 David Luebke 74/1/2014

Vertex processor capabilities • 4-vector FP32 operations, as in GeForce3/4 • True data-dependent control flow • Conditional branch instruction • Subroutine calls, up to 4 deep • Jump table (for switch statements) • Condition codes • New arithmetic instructions (e.g. COS) • User clip-plane support David Luebke 84/1/2014

Vertex processor has high resource limits • 256 instructions per program(effectively much higher w/branching) • 16 temporary 4-vector registers • 256 “uniform” parameter registers • 2 address registers (4-vector) • 6 clip-distance outputs David Luebke 94/1/2014

Fragment processor has clean instruction set • General and orthogonal instructions • Much better than previous generation • Same syntax as vertex processor: MUL R0, R1.xyz, R2.yxw; • Full set of arithmetic instructions:RCP, RSQ, COS, EXP, … David Luebke 104/1/2014

Fragment processor hasflexible texture mapping • Texture reads are just another instruction(TEX, TXP, or TXD) • Allows computed texture coordinates,nested to arbitrary depth • Allows multiple uses of a singletexture unit • Optional LOD control – specify filter extent • Think of it as…A memory-read instruction,with optional user-controlled filtering David Luebke 114/1/2014

Additional fragment processor capabilities • Read access to window-space position • Read/write access to fragment Z • Built-in derivative instructions • Partial derivatives w.r.t. screen-space x or y • Useful for anti-aliasing • Conditional fragment-kill instruction • FP32, FP16, and fixed-point data David Luebke 124/1/2014

Fragment processor limitations • No branching • But, can do a lot with condition codes • No indexed reads from registers • Use texture reads instead • No memory writes David Luebke 134/1/2014

Fragment processor has high resource limits • 1024 instructions • 512 constants or uniform parameters • Each constant counts as one instruction • 16 texture units • Reuse as many times as desired • 8 FP32 x 4 perspective-correct inputs • 128-bit framebuffer “color” output(use as 4 x FP32, 8 x FP16, etc…) David Luebke 144/1/2014

NV30 CineFX Technology Summary VertexProcessor FragmentProcessor FramebufferOperations Assembly &Rasterization Application Framebuffer Textures • FP32 throughout pipeline • Clean instruction sets • True branching in vertex processor • Dependent texture in fragment processor • High resource limits David Luebke 154/1/2014

Programming in assembly is painful Assembly …FRC R2.y, C11.w; ADD R3.x, C11.w, -R2.y; MOV H4.y, R2.y; ADD H4.x, -H4.y, C4.w; MUL R3.xy, R3.xyww, C11.xyww; ADD R3.xy, R3.xyww, C11.z; TEX H5, R3, TEX2, 2D; ADD R3.x, R3.x, C11.x; TEX H6, R3, TEX2, 2D;… … L2weight = timeval – floor(timeval); L1weight = 1.0 – L2weight; ocoord1 = floor(timeval)/64.0 + 1.0/128.0; ocoord2 = ocoord1 + 1.0/64.0; L1offset = f2tex2D(tex2, float2(ocoord1, 1.0/128.0)); L2offset = f2tex2D(tex2, float2(ocoord2, 1.0/128.0)); … • Easier to read and modify • Cross-platform • Combine pieces • etc. David Luebke 164/1/2014

Quick Demo David Luebke 174/1/2014

Cg – C for Graphics • Cg is a GPU programming language • Designed by NVIDIA and Microsoft • Compilers available in beta versions from both companies David Luebke 184/1/2014

Design goals for Cg • Enable algorithms to be expressed… • Clearly, and • Efficiently • Provide interface continuity • Focus on DX9-generation HW and beyond • But provide support for DX8-class HW too • Support both OpenGL and Direct3D • Allow easy, incremental adoption David Luebke 194/1/2014

Easy adoption for applications • Avoid owning the application’s data • No scene graph • No buffering of vertex data • Compiler sits on top of existing APIs • User can examine assembly-code output • Can compile either at run time, or at application-development time • Allow partial adoption e.g. Use Cg vertex program with assembly fragment program • Support current hardware David Luebke 204/1/2014

Some points inthe design space • CPU languages • C – close to the hardware; general purpose • C++, Java, lisp – require memory management • RenderMan – specialized for shading • Real-time shading languages • Stanford shading language • Creative Labs shading language David Luebke 214/1/2014

Design strategy • Start with C(and a bit of C++) • Minimizes number of decisions • Gives you known mistakes instead of unknown ones • Allow subsetting of the language • Add features desired for GPU’s • To support GPU programming model • To enable high performance • Tweak to make it fit together well David Luebke 224/1/2014

How are current GPU’s different from CPU? • GPU is a stream processor • Multiple programmable processing units • Connected by data flows VertexProcessor FragmentProcessor FramebufferOperations Assembly &Rasterization Application Framebuffer Textures David Luebke 234/1/2014

Cg uses separate vertexand fragment programs VertexProcessor FragmentProcessor FramebufferOperations Assembly &Rasterization Application Framebuffer Textures Program Program David Luebke 244/1/2014

Cg programs have twokinds of inputs • Varying inputs (streaming data) • e.g. normal vector – comes with each vertex • This is the default kind of input • Uniform inputs (a.k.a. graphics state) • e.g. modelview matrix • Note: Outputs are always varying vout MyVertexProgram(float4 normal, uniform float4x4 modelview) { … David Luebke 254/1/2014

Two ways to bind VP outputs to FP inputs • Let compiler do it • Define a single structure • Use it for vertex-program output • Use it for fragment-program input struct vout { float4 color; float4 texcoord; … }; David Luebke 264/1/2014

Two ways to bind VP outputs to FP inputs • Do it yourself • Specify register bindings for VP outputs • Specify register bindings for FP inputs • May introduce HW dependence • Necessary for mixing Cg with assembly struct vout { float4 color : TEX3 ; float4 texcoord : TEX5; … }; David Luebke 274/1/2014

Some inputs and outputs are special • e.g. the position output from vert prog • This output drives the rasterizer • It must be marked struct vout { float4 color; float4 texcoord; float4 position : HPOS; }; David Luebke 284/1/2014

How are current GPU’s different from CPU? • Greater variation in basic capabilities • Most processors don’t yet support branching • Vertex processors don’t support texture mapping • Some processors support additional data types • Compiler can’t hide these differences • Least-common-denominator is too restrictive • We expose differences via language profiles(list of capabilities and data types) • Over time, profiles will converge David Luebke 294/1/2014

How are current GPU’s different from CPU? • Optimized for 4-vector arithmetic • Useful for graphics – colors, vectors, texcoords • Easy way to get high performance/cost • C philosophy says: expose these HW data types • Cg has vector data types and operationse.g. float2, float3, float4 • Makes it obvious how to get high performance • Cg also has matrix data typese.g. float3x3, float3x4, float4x4 David Luebke 304/1/2014

Some vector operations // // Clamp components of 3-vector to [minval,maxval] range // float3 clamp(float3 a, float minval, float maxval) { a = (a < minval.xxx) ? minval.xxx : a; a = (a > maxval.xxx) ? maxval.xxx : a; return a; } ? : is per-component for vectors Swizzle – replicate and/or rearrange components. Comparisons between vectorsare per-component, andproduce vector result David Luebke 314/1/2014

Cg has arrays too • Declared just as in C • But, arrays are distinct frombuilt-in vector types:float4 != float[4] • Language profiles may restrict array usage vout MyVertexProgram(float3 lightcolor[10], …) { … David Luebke 324/1/2014

How are current GPU’s different from CPU? • No support for pointers • Arrays are first-class data types in Cg • No integer data type • Cg adds “bool” data type for boolean operations • This change isn’t obvious except when declaring vars David Luebke 334/1/2014

Cg basic data types • All profiles: • float • bool • All profiles with texture lookups: • sampler1D, sampler2D, sampler3D,samplerCUBE • NV_fragment_program profile: • half -- half-precision float • fixed -- fixed point [-2,2) David Luebke 344/1/2014

Other Cg capabilities • Function overloading • Function parameters are value/result • Use “out” modifier to declare return value • “discard” statement – fragment kill void foo (float a, out float b) { b = a; } if (a > b) discard; David Luebke 354/1/2014

Cg Built-in functions • Texture mapping (in fragment profiles) • Math • Dot product • Matrix multiply • Sin/cos/etc. • Normalize • Misc • Partial derivative (when supported) • See spec for more details David Luebke 364/1/2014

New vector operators • Swizzle – replicate/rearrange elements a = b.xxyy; • Write mask – selectively over-write a.w = 1.0; • Vector constructor builds vectora = float4(1.0, 0.0, 0.0, 1.0); David Luebke 374/1/2014

Change to constant-typing mechanism • In C, it’s easy to accidentally use high precision half x, y; x = y * 2.0; // Double-precision multiply! • Not in Cg x = y * 2.0; // Half-precision multiply • Unless you want to x = y * 2.0f; // Float-precision multiply David Luebke 384/1/2014

Dot product,Matrix multiply • Dot product • dot(v1,v2); // returns a scalar • Matrix multiplications: • matrix-vector: mul(M, v); // returns a vector • vector-matrix: mul(v, M); // returns a vector • matrix-matrix: mul(M, N); // returns a matrix David Luebke 394/1/2014

Demos and Examples • Show Assn 4a skeleton code • Show Cg effects browser ( • Fresnel • Simple lighting David Luebke 404/1/2014

Cg runtime API helpsapplications use Cg • Compile a program • Select active programs for rendering • Pass “uniform” parameters to program • Pass “varying” (per-vertex) parameters • Load vertex-program constants • Other housekeeping David Luebke 414/1/2014

Runtime is split into three libraries • API-independent layer – cg.lib • Compilation • Query information about object code • API-dependent layer –cgGL.lib and cgD3D.lib • Bind to compiled program • Specify parameter values • etc. • NB: New API introduced since the following slide • See user’s manual for details • Pay attention to the basic idea here David Luebke 424/1/2014

Runtime API for OpenGL // Create cgContext to hold vertex-profile code VertexContext = cgCreateContext(); // Add vertex-program source text to vertex-profile context // This is where compilation currently occurs cgAddProgram(VertexContext, CGVertProg, cgVertexProfile, NULL); // Get handle to 'main' vertex program VertexProgramIter = cgProgramByName(VertexContext, "main"); cgGLLoadProgram(VertexProgramIter, ProgId); VertKdBind = cgGetBindByName(VertexProgramIter, "Kd"); TestColorBind = cgGetBindByName(VertexProgramIter, "I.TestColor"); texcoordBind = cgGetBindByName(VertexProgramIter, "I.texcoord"); David Luebke 434/1/2014

Runtime API for OpenGL // // Bind uniform parameters // cgGLBindUniform4f(VertexProgramIter, VertKdBind, 1.0, 1.0, 0.0, 1.0); … // Prepare to render cgGLEnableProgramType(cgVertexProfile); cgGLEnableProgramType(cgFragmentProfile); … // Immediate-mode vertex glNormal3fv(&CubeNormals[i][0]); cgGLBindVarying2f(VertexProgramIter, texcoordBind, 0.0, 0.0); cgGLBindVarying3f(VertexProgramIter, TestColorBind, 1.0, 0.0, 0.0); glVertex3fv(&CubeVertices[CubeFaces[i][0]][0]); David Luebke 444/1/2014

Cg Summary • C-like language • With capabilities for GPU’s • Compatible with Microsoft’s HLSL • Use with OpenGL or DirectX • NV20/DX8 and beyond • NV30 + Cg =You control the graphics pipeline David Luebke 454/1/2014

More Information • The Cg Tutorial book • NVIDIA’s “Learn About Cg” page: • http://developer.nvidia.com/view.asp?IO=cg_about • For information, inspiration, and examples, a web forum on writing Cg shaders: • http://www.shadertech.com/ • The Cg user’s manual: • ftp://download.nvidia.com/developer/cg/Cg_1.2.1/Docs/Cg_Toolkit.pdf • Cg downloads—compiler, plug-ins, etc. • http://developer.nvidia.com/object/cg_toolkit.html David Luebke 464/1/2014

Programmable Graphics Hardware

Programmable Graphics Hardware

Presentation Transcript

Graphics Hardware

GRAPHICS HARDWARE

A Crash Course on Programmable Graphics Hardware

Graphics Hardware

SALVIA: A software implementation of modern programmable graphics hardware

Graphics Hardware

Photon Mapping on Programmable Graphics Hardware

The Programmable Graphics Hardware Pipeline

Programmable Hardware

Real-Time Shading Using Programmable Graphics Hardware

Graphics Hardware

Efficient Partitioning of Fragment Shaders for Programmable Graphics Hardware

Introduction to Programmable Graphics Hardware

Ray Tracing using Programmable Graphics Hardware

Interactive Time-Dependent Tone Mapping Using Programmable Graphics Hardware

Programmable Graphics Hardware Languages

Programmable Graphics Hardware

Efficient Partitioning of Fragment Shaders for Programmable Graphics Hardware

Programmable Hardware

Graphics Hardware