1 / 90

GPU Programming Overview

GPU Programming Overview. Spring 2011 류승택. What is a GPU?. GPU stands for G raphics P rocessing U nit Simply – It is the processor that resides on your graphics card. GPUs allow us to achieve the unprecedented graphics capabilities now available in games (Demo: NVIDIA GTX 400 ).

Download Presentation

GPU Programming Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GPU ProgrammingOverview Spring 2011 류승택

  2. What is a GPU? GPU stands for Graphics Processing Unit Simply – It is the processor that resides on your graphics card. GPUs allow us to achieve the unprecedented graphics capabilities now available in games (Demo: NVIDIA GTX 400)

  3. Introduction • GPGPU (General-Purpose Computation on GPUs) • The first commodity, programmable parallel architecture • GPU evolution driven by computer game market • Advantage of data-parallelism • GPUs are >10x faster than CPU for appropriate problems • Advantage of commodity • GPUs are inexpensive • GPUs are Ubiquitous • Desktops, laptops, PDAs, cell phones • Achieving this speedup • Requires a large amount of GPU-specific knowledge

  4. Motivation • Challenge Statement • GPGPU signifies the dawn of the desktop parallel computing age

  5. Why Program on the GPU ? Graph from: http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUDA_C_Programming_Guide.pdf

  6. Why Program on the GPU ? • Compute • Intel Core i7 – 4 cores – 100 GFLOP • NVIDIA GTX280 – 240 cores – 1 TFLOP • Memory Bandwidth • System Memory – 60 GB/s • NVIDIA GT200 – 150 GB/s • Install Base • Over 200 million NVIDIA G80s shipped

  7. How did this happen? • Games demand advanced shading • Fast GPUs = better shading • Need for speed = continued innovation • The gaming industry has overtaken the defense, finance, oil and healthcare industries as the main driving factor for high performance processors.

  8. NVIDIA GPU Evolution Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf

  9. Real-time Rendering • Realtime Rendering • Graphics hardware enables real-time rendering • Real-time means display rate at more than 10 images per second 3D Scene = Collection of 3D primitives (triangles, lines, points) Image = Array of pixels

  10. Graphics Review • Modeling • Rendering • Animation

  11. Graphics Review: Modeling • Modeling • Polygons vs Triangles • How do you store a triangle mesh? • Implicit Surfaces • Height maps • …

  12. Triangles Image courtesy of A K Peters, Ltd. www.virtualglobebook.com

  13. Triangles Image courtesy of A K Peters, Ltd. www.virtualglobebook.com. Imagery from NASA Visible Earth: visibleearth.nasa.gov.

  14. Triangles

  15. Triangles

  16. Implicit Surfaces Images from GPU Gems 3: http://http.developer.nvidia.com/GPUGems3/gpugems3_ch01.html

  17. Height Maps Image courtesy of A K Peters, Ltd. www.virtualglobebook.com

  18. Graphics Review: Rendering • Rendering • Goal: Assign color to pixels • Two Parts • Visible surfaces • What is in front of what for a given view • Shading • Simulate the interaction of material and light to produce a pixel color

  19. Rasterization • What about ray tracing?

  20. Visible Surfaces Image courtesy of A K Peters, Ltd. www.virtualglobebook.com

  21. Visible Surfaces • Z-Buffer / Depth Buffer • Fragment vs Pixel Image courtesy of A K Peters, Ltd. www.virtualglobebook.com

  22. Shading Images courtesy of A K Peters, Ltd. www.virtualglobebook.com

  23. Shading Image from GPU Gems 3: http://http.developer.nvidia.com/GPUGems3/gpugems3_ch14.html

  24. Rasterization and Interpolation Raster Operations Graphics Pipeline Vertex Transforms Primitive Assembly Frame Buffer • Scissor Test • Stencil Test • Depth Test • Blending

  25. Graphics Pipeline Images courtesy of A K Peters, Ltd. http://www.realtimerendering.com/

  26. Graphics Pipeline Images courtesy of A K Peters, Ltd. http://www.realtimerendering.com/

  27. Graphics Pipeline Images courtesy of A K Peters, Ltd. http://www.realtimerendering.com/

  28. Graphics Pipeline Images courtesy of A K Peters, Ltd. http://www.realtimerendering.com/

  29. Graphics Review: Animation • Move the camera and/or agents, and re-render the scene • In less than 16.6 ms (60 fps)

  30. Evolution of the Programmable Graphics Pipeline • Pre GPU • Fixed function GPU • Programmable GPU • Unified Shader Processors

  31. Early 90s – Pre GPU Slide from Mike Houston: http://s09.idav.ucdavis.edu/talks/01-BPS-SIGGRAPH09-mhouston.pdf

  32. OpenGL Pipeline

  33. OpenGL Pipeline

  34. GPU Shader • Fixed functionalities • Programmable functionalities • Flexible memory access

  35. Stream Program => GPU • A stream is a sequence of data (could be numbers, colors, RGBA vectors,…)

  36. Vertex Shader • Vertex transformation • Once per vertex • Input attributes • Normal • Texture coordinates • Colors

  37. Geometry Shader • Geometry composition • Once per geometry • Input primitives • Points, lines, triangles • Lines and triangles with adjacency • Output primitives • Points, line strips or triangle strips • [0, n] primitives outputted

  38. Fragment Shader • Pre-pixel (or fragment) composition • Once per fragment • Operations on interpolated values • Vertex attributes • User-defined varying variables

  39. GPU Shader

  40. Programming Graphics Hardware

  41. PC Architecture

  42. Bus Interface • ISA (Industry Standard Architecture) • 버스 인터페이스 • 90년대 초반의 XT, AT시절부터 사용 • 이론적으로 최대 16Mbps의 속도 • 주변기기에서의 병목현상은 심각 • 처리속도가 크게 문제되지 않는 사운드카드나 모뎀등을 연결하는 정도로 쓰이고 있음 • PCI (Peripheral Component Interconnect) • parallel connection • ISA 후속으로 주변장치 연결을 위해 사용되고 있는 인터페이스 • ISA슬롯보다 크기가 작고 IRQ 공유 • 일반적인 32비트 33MHz는 133Mbps의 속도, 64비트 66MHz는 524Mbps 속도 • 주변 장치 대부분이 PCI인터페이스를 사용 PCI AGP ISA

  43. Bus Interface PCIe x1 PCIe x16 • AGP (Accelerated Graphics Port) • Serial Connection (cheap, scalable) • 인텔에 의해 개발 • PCI에 기반을 두고 있으나 전송 속도는 PCI보다 두배 이상 빠름 • 기본적으로 66MHz로 작동 • AGP = 2 x PCI (AGP 2x = 2 x AGP) • AGP 1x방식일 경우는 최고 264Mbps • AGP 2x방식에서는 최고 533Mbps • 3D 그래픽 카드용 • PCIe (PCI Express) • Serial Connection • 최대 8.0 GB/s 의 대역폭 (PCIe = 2 x AGP x 8) • 전 세계 그래픽 시장을 책임지고 있는 인텔 / ATI / NVIDIA 가 이 새로운 규격을 차세대 그래픽 인터페이스로 확실하게 인정  • 기존 PCI의 제한 때문에 탄생한 그래픽 프로세싱 유닛(GPUs)에 독보적 존재였던 AGP가 PCI Express로 대체되고 있는 상황 PCI GeForce 7800 GTX (PCIe x16)

  44. Rasterization and Interpolation Raster Operations Generation I: 3dfx Voodoo (1996) • One of the first true 3D game cards • Worked by supplementing standard 2D video card. • Did not do vertex transformations: these were done in the CPU • Did do texture mapping, z-buffering. Image from “7 years of Graphics” Vertex Transforms Primitive Assembly Frame Buffer CPU GPU PCI

  45. 1995-1998: Texture Mapping and Z-Buffer • PCI: Peripheral Component Interconnect • 3dfx’s Voodoo

  46. Texture Mapping

  47. Texture Mapping: Perspective-Correct Interpolation

  48. Texture Mapping: Perspective-Correct Interpolation

  49. Aside: Mario Kart 64 • High fragment load / low vertex load Image from: http://www.gamespot.com/users/my_shoe/

  50. Aside: Mario Kart Wii • High fragment load / low vertex load? Image from: http://wii.ign.com/dor/objects/949580/mario-kart-wii/images/

More Related