520 likes | 662 Views
Computer Graphics 3 Lecture 4: GPU Programming. Dr. Benjamin Mora. University of Wales Swansea. 1. Benjamin Mora. Content. Introduction. Vertex and Fragment Programs. Programming the GPU. Assembly Code. High Level Languages. Example of applications. Conclusion.
E N D
Computer Graphics 3Lecture 4:GPU Programming Dr. Benjamin Mora University of Wales Swansea 1 Benjamin Mora
Content • Introduction. • Vertex and Fragment Programs. • Programming the GPU. • Assembly Code. • High Level Languages. • Example of applications. • Conclusion. University of Wales Swansea 2 Benjamin Mora
Introduction University of Wales Swansea 3 Benjamin Mora
Introduction • OpenGL (SGI) early oriented the design of current graphics processors (GPUs). • Fixed pipeline. • Once the different tests are passed, the fragment color is replaced by the new (textured & interpolated) one. • Not realistic enough. • The graphics pipeline is fed with Primitives like Triangles, Points, etc… that are rasterized. • Two main stages: • Vertex processing. • Fragment (rasterized pixel) processing. • These 2 stages have been extended for more realism. University of Wales Swansea 4 Benjamin Mora
Introduction • Latest evolutions • Unified shaders. • Automatic graphical units balancing between vertex and fragment programs. • The lower the image size is, the more cpu and vertex bound the program is. • The greater the image-size is, the more fragment/pixel bound the program is. • Anti-aliasing and texture filtering parameters also contribute to this. • Geometry shaders discussed separately. University of Wales Swansea 5 Benjamin Mora
Vertex and Fragments Programs University of Wales Swansea 6 Benjamin Mora
Vertex and Fragment Programs Daniel Weiskopf, Basics of GPU-Based Programming, http://www.vis.uni-stuttgart.de/vis04_tutorial/vis04_weiskopf_intro_gpu.pdf University of Wales Swansea 7 Benjamin Mora
Vertex Programs: User-Defined Vertex Processing Transform And Lighting Setup Rasterization Texture Fetch, Fragment Shading Fragment Programs: User-Defined Per-Pixel Processing Frame Buffer Blending Vertex and Fragment Programs Vertices Tests (z, stencil…) University of Wales Swansea 8 Benjamin Mora
Programming the GPU University of Wales Swansea 9 Benjamin Mora
Programming the GPU • Low Level languages (Pseudo-assembler). • Help to understand what is possible on the GPU. • Large code is a pain to maintain/optimize. • May be specific to the graphics card generation/supplier. • High Level languages. • Easier to write. • Early compilers were not very good. • Code may be more compatible. • Loops. University of Wales Swansea 10 Benjamin Mora
Current Low Level Languages (APIs) • DirectX 9. • Vertex shader 2.0. • Pixel shader 2.0. • OpenGL extensions. • GL_ARB_vertex_program. • GL_ARB_fragment_program. • Vendor APIs • NVidia vertex and fragment program. University of Wales Swansea 11 Benjamin Mora
Current High Level Languages (APIs) • Microsoft, ATI. • High Level Shading Language (HLSL). • NVidia. • Cg. • OpenGL Shading Language. University of Wales Swansea 12 Benjamin Mora
How to use them? • Assembly programs: • Can be loaded (and compiled) at run-time (OpenGL). • Several programs can be loaded at once. • Applying the suitable rendering style (i.e. program) to every scene primitive. • Avoid latency due to pseudo-assembly compilation. • High level Programs: • Must be compiled before run-time. • The resulting (pseudo) assembly code can then be used. University of Wales Swansea 13 Benjamin Mora
Vertex Programs • Vertex Program. • Bypass the T&L unit. • GPU instruction set to perform all vertex math. • Input: arbitrary vertex attributes. • Output: a transformed vertex attributes. • homogeneous clip space position (required). • colors (front/back, primary/secondary). • fog coord. • texture coordinates. • Point size. University of Wales Swansea 14 Benjamin Mora
Vertex Programs • Customized computation of vertex attributes • Computation of anything that can be interpolated linearly between vertices. • Limitations: • Vertices can neither be generated nor destroyed. • Geometry shader for that. • No information about topology or ordering of vertices is available. University of Wales Swansea 15 Benjamin Mora
Vertex Programs • Vertex programs bypass the following OpenGL functionalities: • Vertex transformations. • The modelview and projection matrix transformations. • Normal transformations and normalizations. • Color material. • Per-vertex lighting. • Texture coordinate generation. • Texture matrix transformations. • Raster position transformation. • Client-defined clip planes. • Per-vertex processing in EXT_point_parameters. • Per-vertex processing in NV_fog_distance. • Per-vertex point size computations. University of Wales Swansea 16 Benjamin Mora
Vertex Programs • What is not replaced? • The view frustum clip. • Perspective divide (division by w). • The viewport transformation. • The depth range transformation. • Clamping the primary and secondary color to [0,1]. • Primitive assembly and per-fragment operations. • Evaluator (except the AUTO_NORMAL normalization). University of Wales Swansea 17 Benjamin Mora
NV Vertex Programs • Different Versions: 1.0,1.1, 2.0, 3.0. • Version 1.0: • 12 temporary vectorial registers (xyzw): R0 => R11. • 96 Read-Only vectorial registers (xyzw). • Specified outside of glBegin/glEnd. • 8 Matrices. • 17 Different Vertex Programs instructions. • (128 instruction Max. inside the program.) • 27 in shader 3.0 model. University of Wales Swansea 18 Benjamin Mora
NV Vertex Programs • Input Parameters for the vertices (v[]): Mnemonic Number Typical Meaning • OPOS 0 object position • WGHT 1 vertex weight • NRML 2 normal • COL0 3 primary color • COL1 4 secondary color • FOGC 5 fog coordinate • TEX0 8 texture coordinate 0 • TEX1 9 texture coordinate 1 • TEX2 10 texture coordinate 2 • TEX3 11 texture coordinate 3 • TEX4 12 texture coordinate 4 • TEX5 13 texture coordinate 5 • TEX6 14 texture coordinate 6 • TEX7 15 texture coordinate 7 University of Wales Swansea 19 Benjamin Mora
NV Vertex Programs • New Output Values for the vertices (o[]): Mnemonic Typical Meaning • HPOS Homogeneous clip space position (x,y,z,w) • COL0 Primary color (front-facing) (r,g,b,a) • COL1 Secondary color (front-facing) (r,g,b,a) • BFC0 Back-facing primary color (r,g,b,a) • BFC1 Back-facing secondary color (r,g,b,a) • FOGC Fog coordinate (f,*,*,*) • PSIZ Point size (p,*,*,*) • TEX0 Texture coordinate set 0 (s,t,r,q) • TEX1 Texture coordinate set 1 (s,t,r,q) • TEX2 Texture coordinate set 2 (s,t,r,q) • TEX3 Texture coordinate set 3 (s,t,r,q) • TEX4 Texture coordinate set 4 (s,t,r,q) • TEX5 Texture coordinate set 5 (s,t,r,q) • TEX6 Texture coordinate set 6 (s,t,r,q) • TEX7 Texture coordinate set 7 (s,t,r,q) University of Wales Swansea 20 Benjamin Mora
NV Vertex Programs • Vertex Program Instructions: OpCode Inputs Output Operation (scalar or vector) (vector or replicated scalar) ARL s address register address register load MOV v v move MUL v,v v multiply ADD v,v v add MAD v,v,v v multiply and add RCP s ssss reciprocal RSQ s ssss reciprocal square root DP3 v,v ssss 3-component dot product DP4 v,v ssss 4-component dot product DST v,v v distance vector MIN v,v v minimum MAX v,v v maximum SLT v,v v set on less than SGE v,v v set on greater equal than EXP s v (ssss?) exponential base 2 LOG s v (ssss?) logarithm base 2 LIT v v light coefficients University of Wales Swansea 21 Benjamin Mora
2 10 8 14 NV Vertex Programs • Special Instruction Manipulation: • Use of Negated Values: • MOV R0,-R1; • ADD R0,R1,-R2; # R0 <= R1-R2 (vectorial operation.) • Registers can be Swizzled: • MOV R1,R1.wzyx; • ADDR R1,R1,R1.xzxy; x y z w • Old R1: • New R1: 1 3 7 11 University of Wales Swansea 22 Benjamin Mora
NV Vertex Programs • Example: Normal Normalization. # v[NRML] = (nx,ny,nz) # # R0.xyz = normalize(v[NRML]) # R0.w = 1/sqrt(nx*nx + ny*ny + nz*nz) # !!VP1.0 MOV R1, v[NRML] ; DP3 R0.w, R1, R1; RSQ R0.w, R0.w; MUL R0.xyz, R1, R0.wwww; # Then use R0 to compute shading... MOV o[COL0],... University of Wales Swansea 23 Benjamin Mora
NV Vertex Programs #simple specular and diffuse lighting computation with an eye-space normal? !!VP1.0 # # c[0-3] = modelview projection (composite) matrix # c[4-7] = modelview inverse transpose # c[32] = normalized eye-space light direction (infinite light) # c[33] = normalized constant eye-space half-angle vector (infinite viewer) # c[35].x = pre-multiplied monochromatic diffuse light color & diffuse material # c[35].y = pre-multiplied monochromatic ambient light color & diffuse material # c[36] = specular color # c[38].x = specular power # # outputs homogenous position and color # DP4 o[HPOS].x, c[0], v[OPOS]; DP4 o[HPOS].y, c[1], v[OPOS]; DP4 o[HPOS].z, c[2], v[OPOS]; DP4 o[HPOS].w, c[3], v[OPOS]; DP3 R0.x, c[4], v[NRML]; DP3 R0.y, c[5], v[NRML]; DP3 R0.z, c[6], v[NRML]; # R0 = n' = transformed normal DP3 R1.x, c[32], R0; # R1.x = Lpos DOT n' DP3 R1.y, c[33], R0; # R1.y = hHat DOT n' MOV R1.w, c[38].x; # R1.w = specular power LIT R2, R1; # Compute lighting values MAD R3, c[35].x, R2.y, c[35].y; # diffuse + emissive MAD o[COL0].xyz, c[36], R2.z, R3; # + specular END University of Wales Swansea 24 Benjamin Mora
NV Fragment Programs • Similar to the Vertex Programs. • Same way to load programs. • Inputs and Outputs are differents. • Different Set of instructions. • More instructions, but tend to be the same… • Versions available: 1.0, 2.0, and 4.0. • 64 constant vector registers. • 32 32-bit floating point precision registers or 64 16-bit floating point precision registers. University of Wales Swansea 25 Benjamin Mora
NV Fragment Programs Fragment Program Inputs Register Name Description f[WPOS] Position of the fragment center. (x,y,z,1/w) f[COL0] Interpolated primary color (r,g,b,a) f[COL1] Interpolated secondary color (r,g,b,a) f[FOGC] Interpolated fog distance/coord (z,0,0,0) f[TEX0] Texture coordinate (unit 0) (s,t,r,q) f[TEX1] Texture coordinate (unit 1) (s,t,r,q) f[TEX2] Texture coordinate (unit 2) (s,t,r,q) f[TEX3] Texture coordinate (unit 3) (s,t,r,q) f[TEX4] Texture coordinate (unit 4) (s,t,r,q) f[TEX5] Texture coordinate (unit 5) (s,t,r,q) f[TEX6] Texture coordinate (unit 6) (s,t,r,q) f[TEX7] Texture coordinate (unit 7) (s,t,r,q) University of Wales Swansea 26 Benjamin Mora
NV Fragment Programs Fragment Program Outputs Register Name Description o[COLR] Final RGBA fragment color, fp32 format (color programs) o[COLH] Final RGBA fragment color, fp16 format (color programs) o[DEPR] Final fragment depth value, fp32 format o[TEX0] TEXTURE0 output, fp16 format (combiner programs) o[TEX1] TEXTURE1 output, fp16 format (combiner programs) o[TEX2] TEXTURE2 output, fp16 format (combiner programs) o[TEX3] TEXTURE3 output, fp16 format (combiner programs) Write access only! University of Wales Swansea 27 Benjamin Mora
NV Fragment Programs Fragment Program Instruction Set (V2.0) Instruction Inputs Output Description ADD[RHX][C][_SAT] v,v v add COS[RH ][C][_SAT] s ssss cosine DDX[RH ][C][_SAT] v v derivative relative to x DDY[RH ][C][_SAT] v v derivative relative to y DP3[RHX][C][_SAT] v,v ssss 3-component dot product DP4[RHX][C][_SAT] v,v ssss 4-component dot product DST[RH ][C][_SAT] v,v v distance vector EX2[RH ][C][_SAT] s ssss exponential base 2 FLR[RHX][C][_SAT] v v floor FRC[RHX][C][_SAT] v v fraction KIL none none conditionally discard fragment LG2[RH ][C][_SAT] s ssss logarithm base 2 LIT[RH ][C][_SAT] v v compute light coefficients LRP[RHX][C][_SAT] v,v,v v linear interpolation MAD[RHX][C][_SAT] v,v,v v multiply and add MAX[RHX][C][_SAT] v,v v maximum MIN[RHX][C][_SAT] v,v v minimum MOV[RHX][C][_SAT] v v move MUL[RHX][C][_SAT] v,v v multiply PK2H v ssss pack two 16-bit floats PK2US v ssss pack two unsigned 16-bit scalars PK4B v ssss pack four signed 8-bit scalars PK4UB v ssss pack four unsigned 8-bit scalars POW[RH ][C][_SAT] s,s ssss exponentiation (x^y) University of Wales Swansea 28 Benjamin Mora
NV Fragment Programs Fragment Program Instruction Set (V2.0) Instruction Inputs Output Description RCP[RH ][C][_SAT] s ssss reciprocal RFL[RH ][C][_SAT] v,v v reflection vector RSQ[RH ][C][_SAT] s ssss reciprocal square root SEQ[RHX][C][_SAT] v,v v set on equal SFL[RHX][C][_SAT] v,v v set on false SGE[RHX][C][_SAT] v,v v set on greater than or equal SGT[RHX][C][_SAT] v,v v set on greater than SIN[RH ][C][_SAT] s ssss sine SLE[RHX][C][_SAT] v,v v set on less than or equal SLT[RHX][C][_SAT] v,v v set on less than SNE[RHX][C][_SAT] v,v v set on not equal STR[RHX][C][_SAT] v,v v set on true SUB[RHX][C][_SAT] v,v v subtract TEX[C][_SAT] v v texture lookup TXD[C][_SAT] v,v, v v texture lookup w/partials TXP[C][_SAT] v v projective texture lookup UP2H[C][_SAT] s v unpack two 16-bit floats UP2US[C][_SAT] s v unpack two unsigned 16-bit scalars UP4B[C][_SAT] s v unpack four signed 8-bit scalars UP4UB[C][_SAT] s v unpack four unsigned 8-bit scalars X2D[RH ][C][_SAT] v,v,v v 2D coordinate transformation University of Wales Swansea 29 Benjamin Mora
NV Fragment Programs • Simple Example: Red Colouring of the fragments (i.e., rasterized pixels): !!FP1.0 DEFINE red={1.0,0,0,0}; MOV o[COLR], red; END • Simple Example: Applying Single Texturing. !!FP1.0 TEX R0, f[TEX0],TEX0, 2D; //Last Parameter can be 1D,2D,3D,RECT MOV o[COLR],R0; END University of Wales Swansea 30 Benjamin Mora
NV Fragment Programs • Useful Instructions: • LRP: Linear Interpolation. • SIN, COS… • SGE,SLT, … : Set the comparison flags. • KILL : Stop the pixel computation. • Pack and Unpack instructions. • Most instructions are done in 1 cycle (without allowing for texture access). • Most instructions can conditionally update the result according the comparison flags (e.g., MOV => MOVC) • Most instructions can clamp the results between 0 and 1. • MOV => MOV_SAT. • Loops are now possible with the latest generation. University of Wales Swansea 31 Benjamin Mora
(Silly) Limitations • Most of the limitations are for performance reasons. • At the fragment level, there is no real possibility to access the frame-buffer in read-write mode. • The new pixel value cannot be computed from the old one. • Floating-point precision filtering and blending only available in recent graphics cards (NV 8x00 generation). Previous cards (e.g., GeForce 7800 series) could only filter and blend at a FP16 precision. • Actual number of registers may be less than the number of logical registers. • Slower programs if a large number of registers is used. University of Wales Swansea 32 Benjamin Mora
High Level Languages • Why ? • Assembly programming can be tedious when having long assembly shaders. • Inefficient or difficult programming and debugging operations. • High-level languages are more portable. • But: • Final code may be slower. University of Wales Swansea 33 Benjamin Mora
High Level Languages: Cg Overview • C for Graphics. • Syntax similar to C for easy shader writing. • See CG manual. http://developer.nvidia.com/object/cg_toolkit.html • The Vertex and Fragments programs take specific input vectors and values, and have to return specific outputs. • Need to declare data structures that will be input and output parameters of a function. University of Wales Swansea 34 Benjamin Mora
Cg: Inputs • Two kinds of shader inputs: • Varying Inputs. • Inputs that are specific to each entity processed. • Vertex: Position, Normals, etc… • Fragment: Interpolated values like colors, texture coordinates, etc… • Uniform Inputs. • Values that do not change when streaming vertices. • Vertex level: Transformation Matrix. • Fragment Level: Constant parameters,… University of Wales Swansea 35 Benjamin Mora
Cg: Vertex Program Inputs • Supported Inputs to a CG Vertex Program (Binding semantics). • POSITION . • BLENDWEIGHT. • NORMAL. • TANGENT. • BINORMAL. • PSIZE. • BLENDINDICES. • TEXCOORD0—TEXCOORD7. • Every parameter can be declared as a float array with a range of 1 to 4 components. (float, float4,…). • float3 myPosition : POSITION; University of Wales Swansea 36 Benjamin Mora
Cg: Vertex Program Inputs • Example from the CG user Manual. struct myinputs { float3 myPosition : POSITION; float3 myNormal : NORMAL; float3 myTangent : TANGENT; float refractive_index : TEXCOORD3; }; outdata foo(myinputs indata) { /* ... */ // Within the program, the parameters are referred to as // “indata.myPosition”, “indata.myNormal”, and so on. /* ... */ } University of Wales Swansea 37 Benjamin Mora
Cg: Vertex Program Inputs • Inputs can be directly specified (rather than using a struct operator). • Example from the CG user Manual: outdata foo( float3 myPosition : POSITION, float3 myNormal : NORMAL, float3 myTangent : TANGENT, float refractive_index : TEXCOORD3) { /* ... */ } University of Wales Swansea 38 Benjamin Mora
Cg: Vertex Program Varying Output • The vertex program output type should match the fragment programs input type. • The binding semantics will help the compiler to associate the vertex output to the fragment input (interoperability). • The semantics do not actually impose a specific use for those channels. • Texture coordinates can be used to specify colors or locations for example. University of Wales Swansea 39 Benjamin Mora
Cg: Vertex Program Varying Output • Supported outputs to a Vertex Program. • POSITION. • PSIZE. • FOG. • COLOR0–COLOR1. • TEXCOORD0–TEXCOORD7. University of Wales Swansea 40 Benjamin Mora
Cg: Vertex Program Varying Output • Example from the CG user Manual: // Vertex program (inside a CG file…) struct myvf { float4 pout : POSITION; // Used for rasterization float4 diffusecolor : COLOR0; float4 uv0 : TEXCOORD0; float4 uv1 : TEXCOORD1; }; myvf foo(/* ... */) { myvf outstuff; /* ... */ return outstuff; } University of Wales Swansea 41 Benjamin Mora
Cg: Input/Output Interoperability • Example from the CG user Manual: struct myvert2frag { float4 pos : POSITION; float4 uv0 : TEXCOORD0; float4 uv1 : TEXCOORD1; }; // Vertex program myvert2frag vertmain(...) { myvert2frag outdata; /* ... */ return outdata; } // Fragment program void fragmain(myvert2frag indata ) { float4 tcoord = indata.uv0; /* ... */ } University of Wales Swansea 42 Benjamin Mora
Cg: Fragment Program Varying Output • Two supported outputs: COLOR and DEPTH. • Examples: void main(/* ... */, out float4 color : COLOR, out float depth : DEPTH) { /* ...*/ color = diffuseColor * /* ...*/; depth = /*...*/; } float4 main(/* ... */) : COLOR { /* ... */ return diffuseColor * /* ... */; } University of Wales Swansea 43 Benjamin Mora
Cg: General Coding • Different type of variables are supported and declarable: • float, half (16 bits), fixed (12 bits). • int, bool. • float1, float4, bool4, bool1,… • float1x1, float2x2,… • Arrays. • Can declare auxiliary functions. • A wide set of functions and operators is also available. University of Wales Swansea 44 Benjamin Mora
Cg: General Coding • Control flow. • if, else, while, for. • Function definitions and function overloads. • Arithmetic operators from C. • Multiplication function. • MatrixxVector, VectorxMatrix, MatrixxMatrix. • Vector constructor. • Boolean and comparison operators. • Swizzle operator. • float4 a; =>a.xxxx; • Write mask operator. • float4 color = float4(1.0, 1.0, 0.0, 0.0); color.a=2.0; • Conditional operator. University of Wales Swansea 45 Benjamin Mora
Cg: General Coding • Standard nonprojective texture lookup: • tex2D (sampler2D tex, float2 s); • texRECT (samplerRECT tex, float2 s); • texCUBE (samplerCUBE tex, float3 s); • Standard projective texture lookup: • tex2Dproj (sampler2D tex, float3 sq); • texRECTproj (samplerRECT tex, float3 sq); • texCUBEproj (samplerCUBE tex, float4 sq); • Math functions: • abs, cos, sin, tan, acos, asin, atan, clamp, determinant, exp, log, floor, lerp, min, max, pow, sqrt, normalize, … University of Wales Swansea 46 Benjamin Mora
Applications University of Wales Swansea 47 Benjamin Mora
Application: Procedural Texturing • Application of textures that are not image based. • Combination of noise and various math expressions. (Perlin Noise.) • Representation of Wood, Marble, Stone, Clouds, Waves, Bumps… • Can be computed at the fragment level. • Adds computations, but reduces bandwidth. • Suppresses the issue of texturing curved surfaces. ref: new york university media research lab, http://mrl.nyu.edu/projects/texture/ University of Wales Swansea 48 Benjamin Mora
Application: Phong Shading • Traditional OpenGL pipeline implements Gouraud (shading) interpolation. • Computation of colors and lighting at the vertices, followed by a linear interpolation. • Can miss the specular highlights that can occur in the middle of a triangle. • Phong interpolation is better. • Linearly interpolate the normal across the triangle first. • Then compute Phong shading from the interpolated normal. ref: new york university media research lab, http://mrl.nyu.edu/projects/texture/ University of Wales Swansea 49 Benjamin Mora
Application: Phong Shading Ian Fergusson, https://www.cis.strath.ac.uk/teaching/ug/classes/52.359/lect13.pdf University of Wales Swansea 50 Benjamin Mora