310 likes | 1.05k Views
Introduction to Graphics Hardware and GPUs Yannick Francken Tom Mertens Overview Definition Motivation History of Graphics Hardware Graphics Pipeline Vertex and Fragment Shaders Modern Graphics Hardware Stream Programming GPU Stream Programming Languages Exercise More Information
E N D
Introduction to Graphics Hardware and GPUs Yannick Francken Tom Mertens
Overview • Definition • Motivation • History of Graphics Hardware • Graphics Pipeline • Vertex and Fragment Shaders • Modern Graphics Hardware • Stream Programming • GPU Stream Programming • Languages • Exercise • More Information
Definition Logical Representation of Visual Information Output Signal
Motivation • Real Time: 15 – 60 fps • High Resolution
Motivation • High CPU load • Physics, AI, sound, network, … • Graphics demand: • Fast memory access • Many lookups [ vertices, normal, textures, … ] • High bandwidth usage • A few GB/s needed in regular cases! • Large number of flops • Flops = Floating Point Operations [ ADD, MUL, SUB, … ] • Illustration: matrix-vector products (16 MUL + 12 ADD) x (#vertices + #normals) x fps = (28 Flops) x (6.000.000) x 30 ≈ 5GFlops • Conclusion: Real time graphics needs supporting hardware!
History of Graphics Hardware • … - mid ’90s • SGI mainframes and workstations • PC: only 2D graphics hardware • mid ’90s • Consumer 3D graphics hardware (PC) • 3dfx, nVidia, Matrox, ATI, … • Triangle rasterization (only) • Cheap: pushed by game industry • 1999 • PC-card with TnL [ Transform and Lighting ] • nVIDIA GeForce: Graphics Processing Unit (GPU) • PC-card more powerful than specialized workstations • Modern graphics hardware • Graphics pipeline partly programmable • Leaders: ATI and nVidia • “ATI Radeon X1950” and “nVidia GeForce 8800” • Game consoles similar to GPUs (Xbox)
Graphics Pipeline LOD selection Frustum Culling Portal Culling … Application Modelview/Projection tr. Clipping Lighting Division by w Primitive Assembly Viewport transform Backface culling Geometry Processing Scan Conversion Fragment Shading [Color and Texture interpol.] Frame Buffer Ops [Z-buffer, Alpha Blending,…] Rasterization Output to Device Output
Graphics Pipeline LOD selection Frustum Culling Portal Culling … Application Programmable Clipping Division by w Primitive Assembly Viewport transform Backface culling VERTEX SHADER Geometry Processing Scan Conversion Rasterization FRAGMENT SHADER Output to Device Output
( x, y, z, w ) ( nx, ny, nz ) ( s, t, r, q ) ( r, g, b, a ) ( x’, y’, z’, w’ ) ( nx’, ny’, nz’ ) ( s’, t’, r’, q’ ) ( r’, g’, b’, a’ ) VERTEX SHADER ( x, y ) ( r’, g’, b’, a’ ) ( depth’ ) ( x, y ) ( r, g, b, a ) ( depth ) FRAGMENT SHADER Vertex and Fragment Shaders
Modern Graphics Hardware • GPU = Graphics Processing Unit • Vector processor • Operates on 4 tuples • Position ( x, y, z, w ) • Color ( red, green, blue, alpha ) • Texture Coordinates ( s, t, r, q ) • 4 tuple ops, 1 clock cycle • SIMD [ Single Instruction Multiple Data ] • ADD, MUL, SUB, DIV, MADD, …
Modern Graphics Hardware • Pipelining • Number of stages • Parallelism • Number of parallel processes • Parallelism + pipelining • Number of parallel pipelines 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
Modern Graphics Hardware • Parallelism + pipelining: ATI Radeon 9700 4 vertex pipelines 8 pixel pipelines
Modern Graphics Hardware • Features of ATI Radeon X1900 XTX • Core speed 650 Mhz • 48 pixel shader processors • 8 vertex shader processors • 51 GB/s memory bandwidth • 512 MB memory
Modern Graphics Hardware • High Memory Bandwidth Graphics Card High bandwidth 51GB/s GPU 650Mhz Graphics memory ½GB Output AGP bus 2GB/s Parallel Processes Processor Chip Main memory 1GB AGP memory ½GB High bandwidth 77GB/s CPU 3Ghz 3GB/s Cache ½MB
Stream Programming • Input: stream of data records • Output: stream(s) of data records • Kernel: operates sequentially on the data records, accessing one record at a time! • Read-Only Memory: record independent read only memory
GPU Stream Programming • Vertex Shader • Input and output streams • Vertices, normals, colors, texture coordinates • Read only memory • Uniform variables • Uniform = constant per stream • Textures, floats, ints, arrays, … • Fragment Shader • Input and output streams • Pixels • Z-values • Read only memory • See above
Languages • Assembly • Cg [ C for Graphics ] • HLSL [ High Level Shading Language ] • GLSL [ OpenGL Shading Language ] • Sh • BrookGPU
Specialized Instruction Set DP4: 4 tuple dot poduct RSQ: reciprocal square root MAD: multiply and add DPH: homogeneous dot product SCS: sine and cosine LRP: linear interpolate TEX: texture map … Nowadays, “not” used directly anymore Generated by high level language compilers !!ARBvp1.0 ATTRIB pos = vertex.position; PARAM mat[4] = { state.matrix.mvp }; # Transform by concatenation of the # MODELVIEW and PROJECTION # matrices. DP4 result.position.x, mat[0], pos; DP4 result.position.y, mat[1], pos; DP4 result.position.z, mat[2], pos; DP4 result.position.w, mat[3], pos; # Pass the primary color through w/o # lighting. MOV result.color, vertex.color; END Assembly
High level programming language Static conditional jumps if, while, for, … Data dependent conditional jumps SIMD Fragment shader: only efficient in case of coherent program flow! No pointers! struct appdata { float4 position : POSITION; float3 normal : NORMAL; float3 color : DIFFUSE; float3 VertexColor : SPECULAR; }; struct vfconn { float4 HPOS : POSITION; float4 COL0 : COLOR0; }; vfconn main( appdata IN, uniformfloat4 Kd, uniform float4x4 mvp ) { vfconn OUT; OUT.HPOS = mul( mvp, IN.position); OUT.COL0.xyz = Kd.xyz * IN.VertexColor.xyz; OUT.COL0.w = 1.0; return OUT; } Cg / HLSL / GLSL
Shader code embedded in C++ // C++ Code … vsh = SH_BEGIN_VERTEX_PROGRAM { ShInputNormal3f normal; ShInputPosition4f p; ShOutputPoint4f ov; ShOutputNormal3f on; ShOutputVector3f lvv; ShOutputPosition4f opd; opd = Globals::mvp | p; on = normalize( Globals::mv | normal ); ov = -normalize( Globals::mv | p ); lvv = normalize( Globals::lightPos – ( Globals::mv | p) ( 0,1,2 ) ); } SH_END_PROGRAM; fsh = SH_BEGIN_FRAGMENT_PROGRAM { ShInputVector4f v; ShInputNormal3f n; ShInputVector3f lvv; ShInputPosition4f p; ShOutputColor3f out; out( 0,1,2 ) = Globals::color * dot( normalize( n ), normalize( lvv ) ); } SH_END_PROGRAM; Sh
GPGPU Language General Purpose GPU Language Brook: Streaming extension of C BrookGPU: GPU port of Brook No computer graphics knowledge required! kernel void k( float s<>, float3 f, float a[10][10], out float o<> ); float a<100>; float b<100>; float c<10,10>; streamRead( a, data1 ); streamRead( b, data2 ); streamRead( c, data3 ); // Call kernel "k" k( a, 3.2f, c, b ); streamWrite( b, result ) BrookGPU
Screenshots • nVidia Toolkit [ Reflection-Bump Mapping ]
Screenshots • Far Cry [ UBISOFT ]
Screenshots • NPR [ ATI Research Group ]
Exercise • Vertex Shader in Cg • Free Form Deformation • Framework available on website Vertex Shader
More Information • nVidia • http://developer.nvidia.com/ • ATI • http://www.ati.com/developer/ • General Purpose GPU Programming • http://www.gpgpu.org • GPU Programming and Architecture • http://www.cis.upenn.edu/~suvenkat/700/ • Hardware • http://www.beyond3d.com • http://www.tomshardware.com