630 likes | 807 Views
By: Aniruddha Marathe. Digital Image Processing With GPU. What should you expect to from this presentation?. What’s the motivation?. What’s a GPU?. The GPU Pipeline. Agenda. Programming the GPU. Performance. Applications.
E N D
By: Aniruddha Marathe Digital Image Processing With GPU
What should you expect to from this presentation? • What’s the motivation? What’s a GPU? • The GPU Pipeline Agenda • Programming the GPU • Performance • Applications
A Talk centered on the Architecture of underlying hardware rather than the Algorithms that run on them. What Should You Expect From This Presentation?
Image Processing Algorithms: • Are involved with large volumes of specific types of data, • Need high computational power (possibly parallel), • Demand real-time processing requirements (in most applications) • These needs can’t be fulfilled by a CPU What’s the motivation?
What’s a GPU? • GPU – Graphical Processing Unit • A Specialized Co-Processor • Very Efficient For • Fast Parallel Floating Point Processing • Single Instruction Multiple Data Operations • High Computation per Memory Access • Not As Efficient For • Double Precision • Logical Operations on Integer Data • Branching-Intensive Operations • Random Access, Memory-Intensive Operations
What’s a GPU? • Dedicated graphics rendering device: • Personal computer, server, game console, mobile device. • GPU chips: • 90%: integrated on motherboard (low end), • 10%: add-on video card (low to high end). • Memory: • Dedicated Video RAM, • Shared System RAM
GPU: Designed for? • As an Image rendering device: • Highly parallel processor • High bandwidth memory • Advanced rendering Capabilities: • Multi-texturing effects. • Realistic lights and shadows effects. • Post processing visual effects. Originally in consumer PCs for gaming.
Some Definitions • Vertex • A data structure for a point in a mesh, containing position, normal and texture coordinates • Fragment • A pixel, possibly sub-pixel, of a rasterized image • Shaders • Small programs run in the GPU at specific stages of the GPU pipeline
GPU pipeline Program/ API Driver CPU Bus GPU GPU Front End Vertex Processing Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer
GPU pipeline Program/ API Driver CPU Bus GPU GPU Front End Vertex Processing Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer
Program/ API GPU pipeline • Program • Your Program • API • Either OpenGL or DirectX Interface
GPU pipeline Program/ API Driver CPU Bus GPU GPU Front End Vertex Processing Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer
GPU pipeline Driver • Driver • Black-box • Implementations are Company Secrets • Largest Bottleneck in many GPU programs
GPU pipeline Program/ API Driver CPU Bus GPU GPU Front End Vertex Processing Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer
GPU pipeline GPU Front End • GPU Front End • Receives commands & data from driver • Communication bridge between the CPU and the GPU • Pulls geometry information from system memory • Outputs a stream of vertices in object space with all their associated information (normals, texture coordinates, per vertex color etc) • PCI Express Bushelps at this stage
GPU pipeline Program/ API Driver CPU Bus GPU GPU Front End Vertex Processing Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer
GPU pipeline Vertex Processing • Vertex Processing • Receives vertices from the GPU Front End in object space and outputs them in screen space • No new vertices are created in this stage, and no vertices are discarded (input/output has 1:1 mapping) • Normals, texcoords etc are also transformed • Programmable Data for Rasterization POSITION Vertex PSIZE Vertex Processor POSITION, NORMAL, BINORMAL*, TANGENT*, TEXCOORD[0-7], COLOR[0-1], PSIZE FOG Data for Interpolation Shader TEXCOORD[0-7] COLOR[0-1] textures
GPU pipeline Program/ API Driver CPU Bus GPU GPU Front End Vertex Processing Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer
Primitive Assembly GPU pipeline • Primitive Assembly • Compiles Vertices into Points, Lines and/or Polygons
GPU pipeline Program/ API Driver CPU Bus GPU GPU Front End Vertex Processing Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer
GPU pipeline Rasterization & Interpolation • Rasterization • Determines respective area of triangle or other primitive for each fragment • Interpolation Primitive Assembler Primitive Type data for rasterization POSITION Rasterizer rasterized data PSIZE DEPTH Barycentric Coordinates FOG TEXCOORD[0-7] COLOR[0-1] Interpolator TEXCOORD[0-7] COLOR[0-1] interpolated data data for interpolation
GPU pipeline Program/ API Driver CPU Bus GPU GPU Front End Vertex Processing Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Raster Operations Framebuffer
GPU pipeline Fragment Processing • Fragment Processing • Programmable data for raster operations with texture and lighting information rasterized data Fragment Processor DEPTH COLOR[0-3] TEXCOORD[0-7] COLOR[0-1] DEPTH shader interpolated data textures
GPU pipeline Program/ API Driver CPU Bus GPU GPU Front End Vertex Processing Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer
GPU pipeline Raster Operations • Depth Checking • Check framebuffer to see if lesser depth already exists (Z-Buffer) • Limited Programmability • Blending • Use alpha channel to combine colors already in the framebuffer • Limited Programmability
Example Program/ API Code Snippet (OpenGL) …. glBegin(GL_TRIANGLES); glTexCoord2f(1,0); glVertex3f(0,1,0); glTexCoord2f(0,1); glVertex3f(-1,-1,0); glTexCoord2f(0,0); glVertex3f(1,-1,0); glEnd(); … Driver Bus GPU Front End Vertex Processing Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer(s)
Example GPU Program/ API Driver Bus GPU Front End 01001001100…. Vertex Processing Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer(s)
Example Program/ API Driver Bus GPU Front End Vertex Processing viewing frustum Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer(s)
Example Program/ API Driver Bus GPU Front End Vertex Processing screen space Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer(s)
Example Program/ API Driver Bus GPU Front End Vertex Processing framebuffer Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer(s)
Example Program/ API Driver Bus GPU Front End Vertex Processing framebuffer Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer(s)
SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP TF TF TF TF TF TF TF TF L1 L1 L1 L1 L1 L1 L1 L1 L2 L2 L2 L2 L2 L2 FB FB FB FB FB FB Broader View Application Application Vertex assembly Data Assembler Setup / Rstr / ZCull Vtx Thread Issue Prim Thread Issue Frag Thread Issue Vertex operations Primitive assembly Thread Processor Primitive operations Rasterization Fragment operations Frame Buffer NVIDIA GeForce 8800 OpenGL Pipeline
SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP TF TF TF TF TF TF TF TF L1 L1 L1 L1 L1 L1 L1 L1 L2 L2 L2 L2 L2 L2 FB FB FB FB FB FB Fixed-function assembly processors Correspondence (By Color) Application-programmable parallel processor Application Application Vertex assembly this was missing Data Assembler Setup / Rstr / ZCull Vtx Thread Issue Prim Thread Issue Frag Thread Issue Vertex operations Primitive assembly Thread Processor Primitive operations Fixed-function framebuffer operations Rasterization(fragment assembly) Fragment operations Framebuffer NVIDIA GeForce 8800 OpenGL Pipeline
NVIDIA G80 GPU Architecture Overview • 16 Multiprocessors Blocks • Each Block Has: • 8 Streaming Processors • 16K Shared Memory • 64K Constant Cache • 8K Texture Cache • Shared Memory: 2 cycle latency • Device Memory: 300 cycle latency
Programmability in the GPU • In a simplified view, three programmable stages: • Vertex Engine • Fragment Engine • Texture Load/Filter Engine
Programmability in the GPU • For non-graphics applications, two programmable blocks running serially: • Vertex Processor • Fragment Processor
Programmability in the GPU • Both Vertex and Fragment Processors • Support FP32 operands and intermediate values. • Use Texture unit as a random-access data fetch unit at 35 GB/sec. • The programmer can write programs that are executed for every vertex as well as for every fragment • This allows fully customizable geometry and shading effects that go well beyond the generic look and feel of older 3D applications
NVIDIA - CUDA • CUDA – ‘Compute Unified Device Architecture’ – a Parallel Computing Architecture developed by NVIDIA. • NVIDIA provides a GPU processing library for programming the GeForce 8800 GPUs. • ‘C’ Style programming.
Fast Border Recognition (From GPU4Vision)
The NVIDIA G80 GPU • 128 streaming floating point processors @1.5Ghz. • 1.5 Gb Shared RAM with 86Gb/s bandwidth • 320 GFLOPS on one chip (single precision)
NVidia G80 GPU Vs. Intel Core 2 Duo