1 / 47

GPU Programming Overview

GPU Programming Overview. Summer 2005 류승택 . Introduction. GPGPU (General-Purpose Computation on GPUs) The first commodity, programmable parallel architecture GPU evolution driven by computer game market Advantage of data-parallelism GPUs are >10x faster than CPU for appropriate problems

rianne
Download Presentation

GPU Programming Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GPU ProgrammingOverview Summer 2005 류승택

  2. Introduction • GPGPU (General-Purpose Computation on GPUs) • The first commodity, programmable parallel architecture • GPU evolution driven by computer game market • Advantage of data-parallelism • GPUs are >10x faster than CPU for appropriate problems • Advantage of commodity • GPUs are inexpensive • GPUs are Ubiquitous • Desktops, laptops, PDAs, cell phones • Achieving this speedup • Requires a large amount of GPU-specific knowledge

  3. Motivation • Challenge Statement • GPGPU signifies the dawn of the desktop parallel computing age

  4. Real-time Rendering • Realtime Rendering • Graphics hardware enables real-time rendering • Real-time means display rate at more than 10 images per second 3D Scene = Collection of 3D primitives (triangles, lines, points) Image = Array of pixels

  5. PC Architecture

  6. Bus Interface • ISA (Industry Standard Architecture) • 버스 인터페이스 • 90년대 초반의 XT, AT시절부터 사용 • 이론적으로 최대 16Mbps의 속도 • 주변기기에서의 병목현상은 심각 • 처리속도가 크게 문제되지 않는 사운드카드나 모뎀등을 연결하는 정도로 쓰이고 있음 • PCI (Peripheral Component Interconnect) • parallel connection • ISA 후속으로 주변장치 연결을 위해 사용되고 있는 인터페이스 • ISA슬롯보다 크기가 작고 IRQ 공유 • 일반적인 32비트 33MHz는 133Mbps의 속도, 64비트 66MHz는 524Mbps 속도 • 주변 장치 대부분이 PCI인터페이스를 사용 PCI AGP ISA

  7. Bus Interface PCIe x1 PCIe x16 • AGP (Accelerated Graphics Port) • Serial Connection (cheap, scalable) • 인텔에 의해 개발 • PCI에 기반을 두고 있으나 전송 속도는 PCI보다 두배 이상 빠름 • 기본적으로 66MHz로 작동 • AGP = 2 x PCI (AGP 2x = 2 x AGP) • AGP 1x방식일 경우는 최고 264Mbps • AGP 2x방식에서는 최고 533Mbps • 3D 그래픽 카드용 • PCIe (PCI Express) • Serial Connection • 최대 8.0 GB/s 의 대역폭 (PCIe = 2 x AGP x 8) • 전 세계 그래픽 시장을 책임지고 있는 인텔 / ATI / NVIDIA 가 이 새로운 규격을 차세대 그래픽 인터페이스로 확실하게 인정  • 기존 PCI의 제한 때문에 탄생한 그래픽 프로세싱 유닛(GPUs)에 독보적 존재였던 AGP가 PCI Express로 대체되고 있는 상황 PCI GeForce 7800 GTX (PCIe x16)

  8. PC Graphics Software Architecture • The application, 3D API and driver are written in C or C++ • The vertex and pixel programs are written in a high-level shading language • (Cg, DirectX HLSL, OpenGL Shading Language) • Pushbuffer: Contains the commands to be executed on the GPU

  9. Hardware Graphics Pipelines

  10. GPU Fundamentals:The Graphics Pipeline • A simplified graphics pipeline • Note that pipe widths vary • Many caches, FIFOs, and so on not shown CPU GPU Graphics State Application Transform Rasterizer Shade VideoMemory(Textures) Vertices(3D) Xformed,LitVertices(2D) Fragments(pre-pixels) Finalpixels(Color, Depth) Render-to-texture

  11. Stream Program => GPU • A stream is a sequence of data (could be numbers, colors, RGBA vectors,…)

  12. Programmable vertex processor! Programmable pixel processor! GPU Fundamentals:The Modern Graphics Pipeline CPU GPU Graphics State VertexProcessor FragmentProcessor Application VertexProcessor Rasterizer PixelProcessor VideoMemory(Textures) Vertices(3D) Xformed,LitVertices(2D) Fragments(pre-pixels) Finalpixels(Color, Depth) Render-to-texture

  13. GPU Pipeline: Transform • Vertex Processor (multiple operate in parallel) • Transform from “world space” to “image space” • Compute per-vertex lighting

  14. GPU Pipeline: Rasterizer • Rasterizer • Convert geometric rep. (vertex) to image rep. (fragment) • Fragment = image fragment • Pixel + associated data: color, depth, stencil, etc. • Interpolate per-vertex quantities across pixels

  15. GPU Pipeline: Shade • Fragment Processors (multiple in parallel) • Compute a color for each pixel • Optionally read colors from textures (images)

  16. Programming Graphics Hardware

  17. 1995-1998: Texture Mapping and Z-Buffer • PCI: Peripheral Component Interconnect • 3dfx’s Voodoo

  18. Texture Mapping

  19. Texture Mapping: Perspective-Correct Interpolation

  20. Texture Mapping: Perspective-Correct Interpolation

  21. 1998: Multitexturing • AGP: Accelerated Graphics Port • NVIDIA’s TNT, ATI’s Rage

  22. Multitexturing Light Mapping

  23. 1999-2000: Transform and Lighting • Register Combiner: Offer many more texture/color combinations • NVIDIA’s Geforce 256 and Geforce2, ATI’s Radeon 7500)

  24. Bump Mapping

  25. Environment Mapping Environment Mapping

  26. Projective Texture Mapping

  27. 2001: Programmable Vertex Shader A programmable processor for any per-vertex computation • Z-Cull: Predicts which fragments will fail the Z test and discard them • Texture Shader: Offer more texture addressing and operations • NVIDIA’s Geforce3 and Geforce4 Ti, ATI’s Radeon 8500

  28. Volume Texture Mapping

  29. 2002-2003: Programmable Pixel Shader A programmable processor for any per-pixel computation • MRT: Multiple Render Target • NVIDIA’s Geforce FX, ATI’s Radeon 9600 to 9800

  30. Shader: Static vs. Dynamic flow control • Static flow control • Condition varies per batch of triangles • Dynamic flow control • Condition varies per vertex or pixel • Full flow control • Static and dynamic flow control

  31. 2004: Shader Model 3.0 and 64 bit Color Support • PCIe: Peripheral Component Interconnect Express • NVIDIA’s Geforce 6800

  32. Rasterization and Interpolation Raster Operations Fixed-function pipeline 3D API Commands 3D API: OpenGL or Direct3D 3D Application Or Game CPU-GPU Boundary (AGP/PCIe) GPU Command & Data Stream Vertex Index Stream Pixel Location Stream Assembled Primitives Pixel Updates GPU Front End Primitive Assembly Frame Buffer Transformed Vertices Transformed Fragments Pre-transformed Vertices Pre-transformed Fragments Programmable Fragment Processor Programmable Vertex Processor

  33. Rasterization and Interpolation Raster Operations Programmable pipeline 3D API Commands 3D API: OpenGL or Direct3D 3D Application Or Game CPU-GPU Boundary (AGP/PCIe) GPU Command & Data Stream Vertex Index Stream Pixel Location Stream Assembled Primitives Pixel Updates GPU Front End Primitive Assembly Frame Buffer Transformed Vertices Transformed Fragments Pre-transformed Vertices Pre-transformed Fragments Programmable Fragment Processor Programmable Vertex Processor

  34. Real-time Tone Mapping • The image is entirely computed in 64-bit color and tone-mapped for display • 64-bit color  16 bit floating-point value per channel (R, G, B, A) • Tone Mapping • HDRI(High Dynamic Range Image)  low dynamic range device From low to high exposure image of the same scene

  35. 2005: Nvidia Geforce 7800 • Nvidia Geforce 7800 • NVIDIA SLI (Scalable Link Interface) Technology • Dramatically scales performance by allowing two graphics cards to be run in parallel. • 64-Bit Floating Point Texture Filtering and Blending • Designed for PCI Express x16 • API Support • Complete DirectX support, including the latest version of Microsoft DirectX 9.0 Shader Model 3.0 • Full OpenGL support, including OpenGL 2.0

  36. Radiosity • A visual effect that shows how light bounces off of some objects and contributes to the final lighting of another object NVIDIA Demo: Mad Mod Mike

  37. The Future • Unified general programming model at primitive, vertex and pixel levels • Scary amount of: • Floating point horsepower • Video memory • Bandwidth b/w system and video memory • Lower chip costs and power requirements to make 3D graphics hardware ubiquitous • Automotive (gaming, navigation, head-up displays) • Home (remotes, media center, automation) • Mobile (PDAs, cell phones)

  38. Programming the GPU

  39. The Evolution of GPU Programming Language

  40. Programmable Pipeline

  41. Programmable Pipeline

  42. GPU Programming • GPU Programming • Low-level Language • Assembler-like • best performance • Platform-dependent • Vertex programming, Fragment programming • Ex) OpenGL extensions, Direct 9 • High-level shading language • Easier programming • Easier code reuse • Easier debugging • Easy to read • Ex) Cg, HLSL, GLSL

  43. Assembly vs. High-Level Language

  44. Data Flow through Pipeline

  45. GPU Programming • GPU Programming • Low-level Language • OpenGL extensions • GL_ARB_vertex_program, GL_ARB_fragment_program • Direct 9 • Vertex Shader 2.0, Pixel Shader 2.0 • High-level shading language • Cg • “C for Graphics” By Nvidia • HLSL • “High-Level Shading Language”, Part of DirectX 9 (Microsoft) • GLSL • “OpenGL 2.0 Shading Language”, Proposal by 3D Labs HLSL and Cg are much more similar to each other than they are to GLSL

  46. Workflow in Cg

  47. Reference • Reference • Course Note • EG2004 • SIGGRAPH2004 • VIS2004 • David Luebke , General-Purpose Computation on Graphics Hardware • Daniel Weiskopf, Basic of GPU-Based Programming • Cyril Zeller, Introduction to the Hardware Graphics Pipeline • Randy Fernando, Programming the GPU • Suresh Venkatasubramanian, GPU Programming and Architecture • GPGPU (http://www.gpgpu.org/) • GPU Programming http://euclid.uits.iupui.edu/wiki/index.php/GPU_Programming • Shader::Tech http://www.shadertech.com/ • Nvidia Developer http://developer.nvidia.com/object/gpu_programming_guide.html • GPGPU DEVELOPER RESOURCES

More Related