930 likes | 1.17k Views
GPU Programming Overview. Spring 2011 류승택. What is a GPU?. GPU stands for G raphics P rocessing U nit Simply – It is the processor that resides on your graphics card. GPUs allow us to achieve the unprecedented graphics capabilities now available in games (Demo: NVIDIA GTX 400 ).
E N D
GPU ProgrammingOverview Spring 2011 류승택
What is a GPU? GPU stands for Graphics Processing Unit Simply – It is the processor that resides on your graphics card. GPUs allow us to achieve the unprecedented graphics capabilities now available in games (Demo: NVIDIA GTX 400)
Introduction • GPGPU (General-Purpose Computation on GPUs) • The first commodity, programmable parallel architecture • GPU evolution driven by computer game market • Advantage of data-parallelism • GPUs are >10x faster than CPU for appropriate problems • Advantage of commodity • GPUs are inexpensive • GPUs are Ubiquitous • Desktops, laptops, PDAs, cell phones • Achieving this speedup • Requires a large amount of GPU-specific knowledge
Motivation • Challenge Statement • GPGPU signifies the dawn of the desktop parallel computing age
Why Program on the GPU ? Graph from: http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUDA_C_Programming_Guide.pdf
Why Program on the GPU ? • Compute • Intel Core i7 – 4 cores – 100 GFLOP • NVIDIA GTX280 – 240 cores – 1 TFLOP • Memory Bandwidth • System Memory – 60 GB/s • NVIDIA GT200 – 150 GB/s • Install Base • Over 200 million NVIDIA G80s shipped
How did this happen? • Games demand advanced shading • Fast GPUs = better shading • Need for speed = continued innovation • The gaming industry has overtaken the defense, finance, oil and healthcare industries as the main driving factor for high performance processors.
NVIDIA GPU Evolution Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf
Real-time Rendering • Realtime Rendering • Graphics hardware enables real-time rendering • Real-time means display rate at more than 10 images per second 3D Scene = Collection of 3D primitives (triangles, lines, points) Image = Array of pixels
Graphics Review • Modeling • Rendering • Animation
Graphics Review: Modeling • Modeling • Polygons vs Triangles • How do you store a triangle mesh? • Implicit Surfaces • Height maps • …
Triangles Image courtesy of A K Peters, Ltd. www.virtualglobebook.com
Triangles Image courtesy of A K Peters, Ltd. www.virtualglobebook.com. Imagery from NASA Visible Earth: visibleearth.nasa.gov.
Implicit Surfaces Images from GPU Gems 3: http://http.developer.nvidia.com/GPUGems3/gpugems3_ch01.html
Height Maps Image courtesy of A K Peters, Ltd. www.virtualglobebook.com
Graphics Review: Rendering • Rendering • Goal: Assign color to pixels • Two Parts • Visible surfaces • What is in front of what for a given view • Shading • Simulate the interaction of material and light to produce a pixel color
Rasterization • What about ray tracing?
Visible Surfaces Image courtesy of A K Peters, Ltd. www.virtualglobebook.com
Visible Surfaces • Z-Buffer / Depth Buffer • Fragment vs Pixel Image courtesy of A K Peters, Ltd. www.virtualglobebook.com
Shading Images courtesy of A K Peters, Ltd. www.virtualglobebook.com
Shading Image from GPU Gems 3: http://http.developer.nvidia.com/GPUGems3/gpugems3_ch14.html
Rasterization and Interpolation Raster Operations Graphics Pipeline Vertex Transforms Primitive Assembly Frame Buffer • Scissor Test • Stencil Test • Depth Test • Blending
Graphics Pipeline Images courtesy of A K Peters, Ltd. http://www.realtimerendering.com/
Graphics Pipeline Images courtesy of A K Peters, Ltd. http://www.realtimerendering.com/
Graphics Pipeline Images courtesy of A K Peters, Ltd. http://www.realtimerendering.com/
Graphics Pipeline Images courtesy of A K Peters, Ltd. http://www.realtimerendering.com/
Graphics Review: Animation • Move the camera and/or agents, and re-render the scene • In less than 16.6 ms (60 fps)
Evolution of the Programmable Graphics Pipeline • Pre GPU • Fixed function GPU • Programmable GPU • Unified Shader Processors
Early 90s – Pre GPU Slide from Mike Houston: http://s09.idav.ucdavis.edu/talks/01-BPS-SIGGRAPH09-mhouston.pdf
GPU Shader • Fixed functionalities • Programmable functionalities • Flexible memory access
Stream Program => GPU • A stream is a sequence of data (could be numbers, colors, RGBA vectors,…)
Vertex Shader • Vertex transformation • Once per vertex • Input attributes • Normal • Texture coordinates • Colors
Geometry Shader • Geometry composition • Once per geometry • Input primitives • Points, lines, triangles • Lines and triangles with adjacency • Output primitives • Points, line strips or triangle strips • [0, n] primitives outputted
Fragment Shader • Pre-pixel (or fragment) composition • Once per fragment • Operations on interpolated values • Vertex attributes • User-defined varying variables
Bus Interface • ISA (Industry Standard Architecture) • 버스 인터페이스 • 90년대 초반의 XT, AT시절부터 사용 • 이론적으로 최대 16Mbps의 속도 • 주변기기에서의 병목현상은 심각 • 처리속도가 크게 문제되지 않는 사운드카드나 모뎀등을 연결하는 정도로 쓰이고 있음 • PCI (Peripheral Component Interconnect) • parallel connection • ISA 후속으로 주변장치 연결을 위해 사용되고 있는 인터페이스 • ISA슬롯보다 크기가 작고 IRQ 공유 • 일반적인 32비트 33MHz는 133Mbps의 속도, 64비트 66MHz는 524Mbps 속도 • 주변 장치 대부분이 PCI인터페이스를 사용 PCI AGP ISA
Bus Interface PCIe x1 PCIe x16 • AGP (Accelerated Graphics Port) • Serial Connection (cheap, scalable) • 인텔에 의해 개발 • PCI에 기반을 두고 있으나 전송 속도는 PCI보다 두배 이상 빠름 • 기본적으로 66MHz로 작동 • AGP = 2 x PCI (AGP 2x = 2 x AGP) • AGP 1x방식일 경우는 최고 264Mbps • AGP 2x방식에서는 최고 533Mbps • 3D 그래픽 카드용 • PCIe (PCI Express) • Serial Connection • 최대 8.0 GB/s 의 대역폭 (PCIe = 2 x AGP x 8) • 전 세계 그래픽 시장을 책임지고 있는 인텔 / ATI / NVIDIA 가 이 새로운 규격을 차세대 그래픽 인터페이스로 확실하게 인정 • 기존 PCI의 제한 때문에 탄생한 그래픽 프로세싱 유닛(GPUs)에 독보적 존재였던 AGP가 PCI Express로 대체되고 있는 상황 PCI GeForce 7800 GTX (PCIe x16)
Rasterization and Interpolation Raster Operations Generation I: 3dfx Voodoo (1996) • One of the first true 3D game cards • Worked by supplementing standard 2D video card. • Did not do vertex transformations: these were done in the CPU • Did do texture mapping, z-buffering. Image from “7 years of Graphics” Vertex Transforms Primitive Assembly Frame Buffer CPU GPU PCI
1995-1998: Texture Mapping and Z-Buffer • PCI: Peripheral Component Interconnect • 3dfx’s Voodoo
Aside: Mario Kart 64 • High fragment load / low vertex load Image from: http://www.gamespot.com/users/my_shoe/
Aside: Mario Kart Wii • High fragment load / low vertex load? Image from: http://wii.ign.com/dor/objects/949580/mario-kart-wii/images/