340 likes | 558 Views
NVIDIA Hardware. Karl Hillesland November 2, 2000. Major release in fall, improvement in spring NV10: GeForce 256 (Fall 1999) NV15: GeForce2 GTS (Spring 2000) NV11: GeForce2 MX (Summer 2000) NV16: GeForce2 Ultra (Fall 2000) NV20: ??? (Anandtech: Dec 2000 - April 2001)
E N D
NVIDIA Hardware Karl Hillesland November 2, 2000
Major release in fall, improvement in spring NV10: GeForce 256 (Fall 1999) NV15: GeForce2 GTS (Spring 2000) NV11: GeForce2 MX (Summer 2000) NV16: GeForce2 Ultra (Fall 2000) NV20: ??? (Anandtech: Dec 2000 - April 2001) NV25?: X-Box (Fall 2001) Cards discussed
0.22um, 23 M transistors 120 MHz core 128 bit, 166 MHz SDR or 150 MHz DDR, up to 128 MB (64 MB biggest I’ve ever heard of) AGP 4x with fast writes 350 MHz RAMDAC DVD TV-out GeForce 256
15 MTris/s (BenMark5 gives 13M. Have seen other references to 14.5M) Up to 6 triangles “in-flight” at a time 2 matrix Vertex skinning Texture coordinate generation (+emboss, reflection, cube map) 8 lights GeForce 256 Triangles
Four Independent Pipelined Engines QuadEngineTM Architecture (from summer 99 notes) Transform Engine Lighting Engine Setup Engine Rendering Engine • Industry-leading 3D performance • 15-25M triangles/second • Sustained DMA, transform/clip/light, setup, rasterize and render rate • Extremely efficient • >70% of the chip active at all times • Up to 6 triangles “in flight” at a time • Super-pipelined design • Very low latency between engines
4 pixel pipes, one texture each. Can do 2-texture multi-texturing by coupling pipes 24/8 bit Z/stencil, 32 bit color (note: 4*(24+8+32)=256) Register Combiners Texture Compression 8-tap anisotropic filtering range based fog anti-aliasing(?) GeForce 256 pixels/texels
2 textures per pipe 25M Transistors 0.18 Micron technology 200 MHz core clock, 166 MHz DDR (“333” MHz) 25M Tris/s (BenMark5 gives 24M Tris/s) Flat panel GeForce 256 -> GeForce2 GTS
Remove two pixel pipes (left with 2, 2 textures each) Dual head support “Digital Vibrance Control” Low power and heat Slower Core Clock (175 MHz) Either 64 or 128 bit memory possible Cheaper: (intended for ~ $100 range) GeForce2 GTS GeForce2 MX
Faster core clock: 250 MHz Faster memory: 225 MHz DDR ( “450” MHz) Expensive: ~ $500 GeForce2 GTS GeForce2 Ultra
Increased clock rates Acceleration of some common CAD-oriented features (.e.g, anti-aliased lines) GeForce Quadro
AGP 4x : 1.2 GB/s Video memory: 333 MHz * 128 bits = 5.3 GB/s PCI: 132 MB/s Host: PC100 with SDRAM = 1.6 GB/s Bandwidths
Q3 -> 18 bytes per vertex position 2 * 3 = 6 bytes texture coords, 2 textures: 2 * 2 * 2 = 8 bytes color: 4 bytes The double eagle: 10/16 bytes per vertex position 2 * 3 = 6 bytes color: 4 bytes Vertex Bandwidth
AGP 4x : 1.2 GB/s / 18 = 67 M Verts/s Video memory: 5.3 GB/s / 18 = 294 M Verts/s PCI: 132 MB/s / 18 = 7.3 M Verts/s Host: PC100 with SDRAM: 1.6 GB/s / 18 = 88 M Verts/s Vertex Bandwidth, Q3
Assume “perfect strips” (one new vertex for each triangle) Each triangle -> 3 indices, 1 new vertex 18 + 2 bytes/index * 3 indicies/tri = 20 bytes/tri indicies and verticies may come across different busses Vertex cache can save some bandwidth Add indices
Texture Environment 1 Texture Compositing Fragment Color Texture Environment 0 Texture Fetching Tex0 Tex1 Specular Color Sum Specular Color Fog Application Fog Color/Factor
Replaces blending of fragment, texture, fog, and secondary colors. Provides configurable 8-bit, signed math per-pixel operations Cascading of register combiners for more sophisticated computations (Hardware limit on levels. Currently 2) Register Combiners
Register Combiners 4 RGB Inputs Fragment Color 4 Alpha Inputs General Combiner 0 3 RGB Outputs Specular Color 3 Alpha Outputs Fog Color/Factor 4 RGB Inputs 4 Alpha Inputs General Combiner 1 Register Set Texture 0 Texture Fetching 3 RGB Outputs 3 Alpha Outputs Texture 1 Spare 0 Specular Color Final Combiner 6 RGB Inputs 1 Alpha Input
Input mappings Invert Negate Bias by 1/2 Expand by 2 Output mappings Bias by 1/2 Scale by 1/2, 2 or 4 Input/Output mappings
A B + C D -or- A B mux C D A B -or- A B C D -or- C D General Combiner, RGB input registers output registers A A RGB RGB inputmap inputmap inputmap inputmap primary color primary color secondary color secondary color A B C D texture 0 texture 0 texture 1 texture 1 spare 0 spare 0 spare 1 spare 1 scaleandbias fog fog constant color 0 constant color 0 constant color 1 constant color 1 zero zero not writeable not readable computations
A B + C D -or- A B mux C D General Combiner, Alpha input registers output registers A A RGB RGB inputmap inputmap inputmap inputmap primary color primary color secondary color secondary color A B C D texture 0 texture 0 texture 1 texture 1 spare 0 spare 0 spare 1 spare 1 scaleandbias A B fog fog constant color 0 constant color 0 constant color 1 constant color 1 C D zero zero not writeable not readable
Final Combiner inputmap inputmap input registers A RGB primary color E F secondary color E F texture 0 spare 0 +secondary color texture 1 inputmap inputmap inputmap inputmap inputmap spare 0 spare 1 A B C D G fog fragment RGB out A B + ( 1 - A) C + D constant color 0 constant color 1 fragment Alpha out G zero
Intel PIII/733 with 238 KB cache 250-300 MHz Core DVD, hard disk custom sound with 64 3D-audio channels X-Box (Abrash on Dr. Dobbs)
125 M Tris gouraud, transformed, shaded, two textures. +one infinite light, 62.45 MTris/sec, 8 local lights 8 MTris/sec 125 M particles/s (single color front-facing squares) Vertex Programs Surface engine “works with CPU” for Catmull-Clark, Bezier, Loop, and uniform B-splines at 50Mtris/sec X-Box Transform/lighting
Replaces transformation and lighting Custom vertex lighting Custom skinning and blending Custom texture coordinate generation Custom matrix operations Custom vertex computations of your choice Vertex Programs
Input is untransformed, unlit vertex Create a transformed vertex Optionally compute lighting texture coordinates fog coordinates point sizes Vertex Programs
Does 4-vector fixed point math 17 Instructions: ARL, MOV, MUL, ADD, MAD, RCP, RSQ, DP3, DP4, DST, MIN, MAX, SLT, SGE, EXP, LOG, LIT Vertex Programs cont.
Vertex Program Registers 16x4 Vertex Attribute Registers 96x4 Program Parameters (e.g, modelview projection matrix) Vertex Program 128 instructions 12x4 Temporary registers 15x4 Vertex Result Registers
Programs are arrays of GLubytes(“strings”) Created/managed similar to texture objects No penalty for switching in and out of vertex program mode execution time ~proportional to length of program Using Vertex Programs (OpenGL)
UMA with GPU in control 64 MB, 128 bit, 200 MHz DDR RAM 1 GPix/sec fill rate + “occlusion circuitry” “automatic z compression” X-Box memory bandwidth
4 textures per pixel (but takes two clocks for >2) One texture can be used as lookup to next texture 8 general register combiners + final combiner 3D Textures Cube maps, compression, etc. 2 or 4 sample anti-aliasing X-Box Textures
DXTC/S3TC Pre-compressed (DDS file) Compressed by driver DXT1/S3TC, DXT3, DXT5 (not DXT2, DXT4) Ugly (be careful of trickery though) Texture compression (OpenGL)