GPU Shading and Rendering

GPU Shading and Rendering

GPU Shading and Rendering:Introduction Marc Olano UMBC

GPU • GPU: Graphics Processing Unit • Designed for real-time graphics • Present in almost every PC • Increasing realismand complexity Americas Army

Texture / Buffer Vertex Geometry Fragment GPU computation CPU Displayed Pixels

Low-level code !!ARBvp1.0 # Transform the normal to view space TEMP Nv,Np; DP3 Nv.x,state.matrix.modelview.invtrans.row[0],vertex.normal; DP3 Nv.y,state.matrix.modelview.invtrans.row[1],vertex.normal; DP3 Nv.z,state.matrix.modelview.invtrans.row[2],vertex.normal; MAD Np,Nv,{.9,.9,.9,0},{0,0,0,1}; # screen position from vertex TEMP Vp; DP4 Vp.x, state.matrix.mvp.row[0], vertex.position; DP4 Vp.y, state.matrix.mvp.row[1], vertex.position; DP4 Vp.z, state.matrix.mvp.row[2], vertex.position; DP4 Vp.w, state.matrix.mvp.row[3], vertex.position; […] # interpolate MAD Np, Np, -vertex.color.x, Np; MAD result.position, Vp, vertex.color.x, Np; END

High-level code void main() { vec4 Kin = gl_Color; // key input // screen position from vertex, texture and normal vec4 Vp = ftransform(); vec4 Tp = vec4(gl_MultiTexCoord0.xy*1.8-.9, 0,1); vec4 Np = vec4(nn*.9,1); // interpolate between Vp, Tp and Np gl_Position = Vp; gl_Position = mix(Tp,gl_Position,pow(1.-Kin.x,8.)); gl_Position = mix(Np,gl_Position,pow(1.-Kin.y,8.)); // copy to output gl_TexCoord[0] = gl_MultiTexCoord0; gl_TexCoord[1] = Vp; gl_TexCoord[3] = Kin; }

Not real-time Developed from General CPU code Seconds to hours per frame 1000s of lines “Unlimited” computation, texture, memory, … Real-time Developed from fixed-function hardware Tens of frames per second 1000s of instructions Limited computation, texture, memory, … Non-real time vs. Real time

Non-real time Real-time Non-real time vs. Real-time Application Application Displacement Texture/ Buffer Vertex Surface Light Volume Geometry Atmosphere Fragment Imager Displayed Pixels Displayed Pixels

History (not real-time) • Testbed [Whitted and Weimer 1981] • Shade Trees [Cook 1984] • Image Synthesizer [Perlin 1985] • RenderMan [Hanrahan and Lawson 1990] • Multi-pass RenderMan [Peercy et al. 2000] • GPU acceleration [Wexler et al. 2005]

History (real-time) • Custom HW [Olano and Lastra 1998] • Multi-pass standard HW [Peercy et al. 2000] • Register combiners [NVIDIA 2000] • Vertex programs [Lindholm et al. 2001] • Compiling to mixed HW [Proudfoot et al. 2001] • Fragment programs • Standardized languages • Geometry shaders [Blythe 2006]

Choices • OS: Windows, Mac, Linux • API: DirectX, OpenGL • Language: HLSL, GLSL, Cg, … • Compiler: DirectX, OpenGL, Cg, ASHLI • Runtime: CgFX, ASHLI, OSG (& others), sample code

Major Commonalities • Vertex & Fragment/Pixel • C-like, if/while/for • Structs & arrays • Float + small vector and matrix • Swizzle & mask (a.xyz = b.xxw) • Common math & shading functions

Texture / Buffer Vertex Geometry Fragment GPU Parallelism Pipeline

Texture / Buffer Vertex Geometry Fragment GPU Parallelism Pipeline SPMD ParallelFragment Stream

Fragment Fragment Fragment Fragment GPU Parallelism SIMD Parallel2x2 Block SPMD ParallelFragment Stream

Fragment Fragment Fragment Fragment Texture Unit Shader Unit Shader Unit L1 Cache Branch Unit L2 Cache Fog GPU Parallelism SIMD Parallel2x2 Block Pipeline (NVIDIA)

Texture Unit Shader Unit ALU ALU Shader Unit L1 Cache ALU ALU Branch Unit L2 Cache Fog GPU Parallelism Vector ParallelLimited MIMD Pipeline (NVIDIA)

Vertex (stream) Buffer Geometry(stream) Fragment(array) Managing GPU Programming • Simplified computational model • Bonus: consistent as hardware changes • All stages SIMD • Explicit 4-element SIMD vectors • Fixed conversion / remapping between each stage

Vertex (stream) Buffer Geometry(stream) Fragment(array) Vertex • One element in / one out • NO communication • Can select fragment address

Vertex (stream) Buffer Geometry(stream) Fragment(array) Geometry • More next (Blythe talk) • One element in / 0 to ~100 out • Limited by hardware buffer sizes • Like vertex: • NO communication • Can select fragment address

Vertex (stream) Buffer Geometry(stream) Fragment(array) Fragment • Biggest computational resource • One element in / 0 – 1 out • Cannot change destination address • I am element x,y in an array, what is my value? • Effectively no communication • Conditionals expensive • Better if block coherence

Vertex (stream) Buffer Geometry(stream) Fragment(array) Program / Multiple Passes • Communication • None in one pass • Arbitrary read addresses between passes • Data layout • No persistent per-processor memory • No penalty to change

Multiple passes • GPGPU • Non-local effects • Shadow maps • Texture space • Precomputation • Fix some degrees of freedom • Factor into functions of 1-3D • Project input or output into another space

GPU Shading and Rendering

GPU Shading and Rendering