1 / 39

GPU Architecture & Cg

GPU Architecture & Cg Mark Colbert PhD Candidate UCF Graphics Group / MCL colbert@cs.ucf.edu © 2006 University of Central Florida welcome Assumptions Experienced in CC++ Basic OpenGLDirectX Knowledge Some Graphics Knowledge Familiarity with Geometric Transformations Linear Algebra

emily
Download Presentation

GPU Architecture & Cg

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GPU Architecture & Cg Mark ColbertPhD CandidateUCF Graphics Group / MCL colbert@cs.ucf.edu © 2006 University of Central Florida

  2. welcome • Assumptions • Experienced in C\C++ • Basic OpenGL\DirectX Knowledge • Some Graphics Knowledge • Familiarity with Geometric Transformations • Linear Algebra • Purpose • Introduction to the Programmable GPU • Extremely Fast Paced • For the Geeky at Heart

  3. overview • GPU Architecture • GPU Pipeline • Introduction to Cg • Implications for GPGPU

  4. GPU • Graphics Processing Unit • Parallelized SIMD Architecture • Denoted as Pipes • 24 fragment pipes on nVidia 7800 • Each Pipe Handles 4 Vector Operations

  5. rules of the game • Not a Generalized Vector Processor • Cannot read and write to same areas of memory • Limited output capability • Currently, very expensive to output to locations arbitrary locations in memory

  6. notation • Vertex • A data structure for a point in a mesh, containing position, normal, texture coordinates and more… • Fragment • A pixel, possibly sub-pixel, of a rasterized image • Shaders • Small programs ran in the GPU at specific stages of the GPU pipeline

  7. memory constructs • Buffered Objects • Uniform Registers/State Table • Interpolated Registers • Temporary Registers • Textures

  8. memory constructs • Buffered Objects • CPU Generated Streams of Data • Limited Modifiability • Example • Vertex Data of a Mesh

  9. memory constructs • Uniform Registers/State Table • Constant Data through the Pipeline • Only Necessarily Constant for 1 Polygon • 32 general purpose registers • State Table Specific Registers • Projection/Model View Matrices • Lights • … and more

  10. memory constructs • Interpolated Registers • Per Vertex Data of a Polygon • Stores Information Interpolated Across Polygon • 10 General Purpose Interpolated Registers

  11. memory constructs • Temporary Registers • Standard Notion of Registers • Temporary Registers for In Shader Calculations

  12. memory constructs • Textures • Closest to Random Access Memory • Expensive to Access • Multiple Dependent AccessesExtremely Expensive

  13. GPU pipeline Program/ API Driver CPU Bus GPU GPU Front End Vertex Processing Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer

  14. Program/ API GPU pipeline • Program • Your Program • API • Either OpenGL or DirectX Interface

  15. Driver GPU pipeline • Driver • Black-box • Implementations are Company Secrets • Largest Bottleneck in many GPU programs

  16. GPU Front End GPU pipeline • GPU Front End • Receives commands & data from driver • PCI Express helps at this stage

  17. Vertex Processing GPU pipeline • Vertex Processing • Normally performs transformations • Programmable data for rasterization POSITION vertex POSITION, NORMAL, BINORMAL*, TANGENT*, TEXCOORD[0-7], COLOR[0-1], PSIZE PSIZE Vertex Processor FOG TEXCOORD[0-7] COLOR[0-1] shader data for interpolation textures

  18. Primitive Assembly GPU pipeline • Primitive Assembly • Compiles Vertices into Points, Lines and/or Polygons • Link elements and set rasterizer

  19. Rasterization & Interpolation GPU pipeline • Rasterization • For each fragment determine respective area of triangle (Barycentric Coordinates) or other primitive • Interpolation Primitive Assembler Primitive Type data for rasterization POSITION Rasterizer rasterized data PSIZE DEPTH Barycentric Coordinates FOG TEXCOORD[0-7] COLOR[0-1] Interpolator TEXCOORD[0-7] COLOR[0-1] interpolated data data for interpolation

  20. Fragment Processing GPU pipeline • Fragment Processing • Programmable rasterized data Fragment Processor data for raster ops DEPTH COLOR[0-3] TEXCOORD[0-7] COLOR[0-1] DEPTH shader interpolated data textures

  21. Raster Operations GPU pipeline • Depth Checking • Check framebuffer to see if lesser depth already exists (Z-Buffer) • Limited Programmability • Blending • Use alpha channel to combine colors already in the framebuffer • Limited Programmability

  22. example Program/ API Code Snippet …. glBegin(GL_TRIANGLES); glTexCoord2f(1,0); glVertex3f(0,1,0); glTexCoord2f(0,1); glVertex3f(-1,-1,0); glTexCoord2f(0,0); glVertex3f(1,-1,0); glEnd(); … Driver Bus GPU Front End Vertex Processing Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer(s)

  23. GPU example Program/ API Driver Bus GPU Front End 01001001100…. Vertex Processing Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer(s)

  24. example Program/ API Driver Bus GPU Front End Vertex Processing viewing frustum Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer(s)

  25. example Program/ API Driver Bus GPU Front End Vertex Processing screen space Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer(s)

  26. example Program/ API Driver Bus GPU Front End Vertex Processing framebuffer Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer(s)

  27. example Program/ API Driver Bus GPU Front End Vertex Processing framebuffer Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer(s)

  28. quick architecture notes • Limits in Shader Size • Pixel Shader 3.0 Spec • Vertex Program – 65535 asm instructions • Fragment Program – 65535+ asm instructions • MIMD • Branches are supported with a large overhead • Rasterizer & Interpolator • Programmable in DirectX 10 • Geometric Shaders • Unified Shading Architecture • Xbox 360 – ATI • Pool of processors with load balancing

  29. higher level shading languages • Vectorized languages for designing shader programs • Easy way out of tedious assembly coding • Not Perfect • Results Are Sometimes Clearly Not Optimized • Examples • Cg • GLSL • HLSL

  30. Cg • nVidia’s Solution • Nearly Identical to HLSL • C++ Based • New Intrinsic Classes • New Intrinsic Functions • Semantics

  31. Cg • Intrinsic Classes • Vectorized Primitives • i.e. float2, float3, float4 • 16-bit Floating Point Constructs • half, half2, half3, half4 • not enabled in ARB shaders • Fixed Precision Decimals • fixed, fixed2, fixed3, fixed4 • Not enabled in ARB shaders

  32. Cg • Intrinsic Classes (cont’d) • Membership Access • Constructor • e.g. float4 v = float4(a,b,c,d); • Array Operator • e.g. v[0], v[1], v[2], or v[3] • Swizzle Operator • Re-order/Build Vectors • e.g. v.xyz, v.xxxz, v.yyx, v.yx, v.xyzw • Replaceable with rgba instead of xyzw

  33. Cg • Intrinsic Classes (cont’d) • Matrices • Compounded Vector Classes • e.g. float4x4 • Constructed with multiple vectors • float4 v = float4(a,b,c,d); float4x4 m = float4x4(v,v,v,v); • Samplers • Texture Data Type • sampler1D, sampler2D, samplerRECT, sampler3D • samplerRECT – Same as sampler2D but uses pixel locations as texture coordinates instead of from [0,1]

  34. Cg • Intrinsic Functions • Many have direct correspondence to assembly instructions or good approximations • Linear Algebra Functions • dot(a,b) – Dot Product • mul – Matrix-Matrix, Vector-Matrix, or Matrix-Vector multiplication • Texture Lookup Functions • tex*(sampler* texture, float* texCoord) • * - The dimensionality of the texture

  35. Cg • Intrinsic Functions (cont’d) • Geometric Intrinsic • distance, faceforward, length, normalize, reflect, refract • A good chunk of math.h • Most Taylor series expansions for two coefficients

  36. Cg • Semantics • Binds variables to GPU Memory Constructs • Uniform Registers • In declaration, use keyword uniform in front of variable type • Vertex Data/Interpolated Registers • float* varName : SEMANTIC • Only used as main function parameteror global variable • Textures • Same as uniform variable

  37. FX Composer • Program for quick shader design • Uses Cg as underlying shading language • Additional Semantic Bindings • NOTE: Uses DirectX as base, so uses vector-matrix multiplication notation

  38. FX Composer • Walkthrough Example

  39. GPGPU • General Purpose GPU Processing • Key Notes • Goal to exploit fragment processor • Each pixel represents a compacted 4-component element of data • Most optimal in gathering algorithms • Vertex shader needed to re-order output • Possibly Optimal in Unified Shading Architecture

More Related