270 likes | 417 Views
Status – Week 281. Victor Moya. Objectives. Research in future GPUs for 3D graphics. Simulate current and future 3D graphic hardware. Finish (someday) the PhD ;). Problems. Information. Choice of the simulation target: Current GPUs. Near future GPUs. Absolutely new GPU designs.
E N D
Status – Week 281 Victor Moya
Objectives • Research in future GPUs for 3D graphics. • Simulate current and future 3D graphic hardware. • Finish (someday) the PhD ;).
Problems • Information. • Choice of the simulation target: • Current GPUs. • Near future GPUs. • Absolutely new GPU designs. • Future is hard to predict. • But GPUs change very fast. • Fierce competence between ATI and NVidia. Matrox and 3DLabs follow (3DLabs can rule workstation market). SIS and VIA as OEM.
Status • Designing a hardware 3D graphics pipeline: • Command processors. • Vertex Shader. • Divide by w, Clip, Culling and Triangle Setup. • Rasterization. • Pixel shaders. • Antialiasing. • Designing the simulator.
Geometry • Vertex operations: • (1) Transform coordinates and normal • Model => World. • World => Eye. • (2) Normalize the length of the normal. • (3) Compute vertex lightning. • (4) Transform texture coordinates. • (5) Transform coordinates to clip coordinates (projection). • (8) Divide coordinate by w. • (9) Apply affine viewport transform (x, y, z).
Geometry • Primitive operations: • (6) Primitive assembly • (7) Clipping: • (10) Backface cull: eliminate back-facing triangles. • Primitive generation: new pipeline stage (ATI TruForm).
Vertex Shader • VS 1.0, 1.1 and 1.2 (current technology) for Direct3D 8 and 8.1. OpenGL extensions: ARB_vertex_program (finally in OpenGL v1.4), NV_vertex_program1_1 (NVidia), EXT_vertex_shader (ATI). • No branching. • Single cycle execution latency (?). • Single issue instruction each cycle. • Simple in order pipeline (?).
Vertex Shader • 16 input registers (read only). • 15 output registers (write only). • 12 temporary registers (read/write). • 96 constant registers (read only or read/write?). • 256 instructions max
Vertex Shader • Output • Inputs (vector or • Opcode (scalar or vector) replicated scalar) Operation • ------ ------------------ ------------------ -------------------------- • ARL s address register address register load • MOV v v move • MUL v,v v multiply • ADD v,v v add • MAD v,v,v v multiply and add • RCP s ssss reciprocal • RSQ s ssss reciprocal square root • DP3 v,v ssss 3-component dot product • DP4 v,v ssss 4-component dot product • DST v,v v distance vector • MIN v,v v minimum • MAX v,v v maximum • SLT v,v v set on less than • SGE v,v v set on greater equal than • EXP s v exponential base 2 • LOG s v logarithm base 2 • LIT v v light coefficients • DPH v,v ssss homogeneous dot product • RCC s ssss reciprocal clamped • SUB v,v v subtract • ABS v v absolute value
Clipping • Clip geometry primitives with the view frustrum (6 planes). • Clip geometry primitives with the user clip planes. • Techniques used: • Guard-Band Clipping. • Homogenous rasterization avoids clipping in the geometry stage.
Homogeneus coordinates • “Triangle Scan Conversion using 2D Homogeneus Coordinates”, Olano and Greer.
Rasterization • Setup (per-triangle). • Sampling (triangle = {fragments}. • Interpolation (interpolate colors and coordinates).
Rasterization • Converts primitives to fragments. • Primitive: point, line, polygon, … • Fragment: transient data structure short x, y; long depth; short r, g, b, a; • Fragment selection. • Parameter Assignment (color, depth ...).
NV_vertex_program2 • ARL (new support for four-component A0 and A1 instead of just A0.x) • ARR (similar to ARL, but rounds instead of truncating before storing the integer result in an address register) • BRA, CAL, RET (branching instructions) • COS, SIN (high-precision trigonometric functions) • FLR, FRC (floor and fraction of floating-point values) • EX2, LG2 (high-precision exponentiation and logarithm functions) • ARA (adds pairs of components of an address register; useful for looping and other operations) • SEQ, SFL, SGT, SLE, SNE, STR (“set on” instructions similar to SLT, SGE) • SSG (“set sign” operation; generates a vector holding –1.0 for negative operand components, 0 for zero-value components, and +1.0 for positive components)
NV_vertex_program2 Overview • 1. Condition codes • 2. Branching & subroutines • 3. Even faster performance • 4. Nineteen new instructions • 5. New source modifiers • 6. Clip plane support • 7. More registers & instructions
NV_vertex_program2 Resource Limits • 256 vertex program parameters • Up from 96 • 16 temporary registers • Up from 12 • Two 4-component address registers • Up from one single-component address register • 256 static instructions per program • Up from 128 • Given branching, 65536 dynamic instructions can execute before termination to avoid infinite loops
NV_vertex_program2 Source Modifiers • Source operand absolute value • Example: MOV R0, |R1|; • In addition to source negation & swizzling • Example: MAD R0, -|R1|.yzwy, |R2|, -R3,w; • Swizzle, negate, & absolute value operations are “free” source modifiers
NV_vertex_program2 Condition Codes (1) • Condition code state • 4-component register stores condition code values • Four possible values • LT –less than zero • EQ – equal to zero • GT –greater than zero • UN– unordered, for comparisons involving NaN • Most instructions optionally update condition code state • Indicated with “C” suffix: DP4C, MOVC, etc • “CC” pseudo-register used to just update condition codes
NV_vertex_program2 Condition Codes (2) • Optional condition code based destination masking • Example: MOV R1.xy(NE.z), R0; • Copy R0components to R1’s X & Y components except when condition code’s Z component is EQ • Condition code rules: EQ, equal; GE, greater or equal; GT, greater than; LE, less or equal; LT, less than; NE, not equal; FL, false; and TR, true • Note that condition code masking rule can swizzle condition code components