260 likes | 279 Views
GPU Shading and Rendering. GPU Shading and Rendering: Introduction. Marc Olano UMBC. GPU. GPU: Graphics Processing Unit Designed for real-time graphics Present in almost every PC Increasing realism and complexity. Americas Army. Texture / Buffer. Vertex. Geometry. Fragment.
E N D
GPU Shading and Rendering:Introduction Marc Olano UMBC
GPU • GPU: Graphics Processing Unit • Designed for real-time graphics • Present in almost every PC • Increasing realismand complexity Americas Army
Texture / Buffer Vertex Geometry Fragment GPU computation CPU Displayed Pixels
Low-level code !!ARBvp1.0 # Transform the normal to view space TEMP Nv,Np; DP3 Nv.x,state.matrix.modelview.invtrans.row[0],vertex.normal; DP3 Nv.y,state.matrix.modelview.invtrans.row[1],vertex.normal; DP3 Nv.z,state.matrix.modelview.invtrans.row[2],vertex.normal; MAD Np,Nv,{.9,.9,.9,0},{0,0,0,1}; # screen position from vertex TEMP Vp; DP4 Vp.x, state.matrix.mvp.row[0], vertex.position; DP4 Vp.y, state.matrix.mvp.row[1], vertex.position; DP4 Vp.z, state.matrix.mvp.row[2], vertex.position; DP4 Vp.w, state.matrix.mvp.row[3], vertex.position; […] # interpolate MAD Np, Np, -vertex.color.x, Np; MAD result.position, Vp, vertex.color.x, Np; END
High-level code void main() { vec4 Kin = gl_Color; // key input // screen position from vertex, texture and normal vec4 Vp = ftransform(); vec4 Tp = vec4(gl_MultiTexCoord0.xy*1.8-.9, 0,1); vec4 Np = vec4(nn*.9,1); // interpolate between Vp, Tp and Np gl_Position = Vp; gl_Position = mix(Tp,gl_Position,pow(1.-Kin.x,8.)); gl_Position = mix(Np,gl_Position,pow(1.-Kin.y,8.)); // copy to output gl_TexCoord[0] = gl_MultiTexCoord0; gl_TexCoord[1] = Vp; gl_TexCoord[3] = Kin; }
Not real-time Developed from General CPU code Seconds to hours per frame 1000s of lines “Unlimited” computation, texture, memory, … Real-time Developed from fixed-function hardware Tens of frames per second 1000s of instructions Limited computation, texture, memory, … Non-real time vs. Real time
Non-real time Real-time Non-real time vs. Real-time Application Application Displacement Texture/ Buffer Vertex Surface Light Volume Geometry Atmosphere Fragment Imager Displayed Pixels Displayed Pixels
History (not real-time) • Testbed [Whitted and Weimer 1981] • Shade Trees [Cook 1984] • Image Synthesizer [Perlin 1985] • RenderMan [Hanrahan and Lawson 1990] • Multi-pass RenderMan [Peercy et al. 2000] • GPU acceleration [Wexler et al. 2005]
History (real-time) • Custom HW [Olano and Lastra 1998] • Multi-pass standard HW [Peercy et al. 2000] • Register combiners [NVIDIA 2000] • Vertex programs [Lindholm et al. 2001] • Compiling to mixed HW [Proudfoot et al. 2001] • Fragment programs • Standardized languages • Geometry shaders [Blythe 2006]
Choices • OS: Windows, Mac, Linux • API: DirectX, OpenGL • Language: HLSL, GLSL, Cg, … • Compiler: DirectX, OpenGL, Cg, ASHLI • Runtime: CgFX, ASHLI, OSG (& others), sample code
Major Commonalities • Vertex & Fragment/Pixel • C-like, if/while/for • Structs & arrays • Float + small vector and matrix • Swizzle & mask (a.xyz = b.xxw) • Common math & shading functions
Texture / Buffer Vertex Geometry Fragment GPU Parallelism Pipeline
Texture / Buffer Vertex Geometry Fragment GPU Parallelism Pipeline SPMD ParallelFragment Stream
Fragment Fragment Fragment Fragment GPU Parallelism SIMD Parallel2x2 Block SPMD ParallelFragment Stream
Fragment Fragment Fragment Fragment Texture Unit Shader Unit Shader Unit L1 Cache Branch Unit L2 Cache Fog GPU Parallelism SIMD Parallel2x2 Block Pipeline (NVIDIA)
Texture Unit Shader Unit ALU ALU Shader Unit L1 Cache ALU ALU Branch Unit L2 Cache Fog GPU Parallelism Vector ParallelLimited MIMD Pipeline (NVIDIA)
Vertex (stream) Buffer Geometry(stream) Fragment(array) Managing GPU Programming • Simplified computational model • Bonus: consistent as hardware changes • All stages SIMD • Explicit 4-element SIMD vectors • Fixed conversion / remapping between each stage
Vertex (stream) Buffer Geometry(stream) Fragment(array) Vertex • One element in / one out • NO communication • Can select fragment address
Vertex (stream) Buffer Geometry(stream) Fragment(array) Geometry • More next (Blythe talk) • One element in / 0 to ~100 out • Limited by hardware buffer sizes • Like vertex: • NO communication • Can select fragment address
Vertex (stream) Buffer Geometry(stream) Fragment(array) Fragment • Biggest computational resource • One element in / 0 – 1 out • Cannot change destination address • I am element x,y in an array, what is my value? • Effectively no communication • Conditionals expensive • Better if block coherence
Vertex (stream) Buffer Geometry(stream) Fragment(array) Program / Multiple Passes • Communication • None in one pass • Arbitrary read addresses between passes • Data layout • No persistent per-processor memory • No penalty to change
Multiple passes • GPGPU • Non-local effects • Shadow maps • Texture space • Precomputation • Fix some degrees of freedom • Factor into functions of 1-3D • Project input or output into another space