220 likes | 354 Views
Image Synthesis. GP-GPU. Graphics hardware. Current performace – PlayStation 3 CPU: Cell Prozessor (3,2 GHz) 512 kB L2-Cache ~200 GFLOP/s GPU (Graphics Processing Unit) Nvidia RSX Reality Synthesizer (550 MHz, ~300 MTransistors ~ 1,8 TFLOP/s ~ 20 GPixels/s ~ 2 GTriangles/s.
E N D
Image Synthesis GP-GPU
Graphics hardware • Current performace – PlayStation 3 • CPU: Cell Prozessor (3,2 GHz) • 512 kB L2-Cache • ~200 GFLOP/s • GPU (Graphics Processing Unit) • Nvidia RSX Reality Synthesizer (550 MHz, ~300 MTransistors • ~ 1,8 TFLOP/s • ~ 20 GPixels/s • ~ 2 GTriangles/s
Graphics hardware - history • 80: simple rasterization • Windows, lines, polygons, text-fonts • 90-95: „Geometry-Engines“ only on High-End-Workstations • e.g. SGI O2 vs. Indigo2) • 95: newrasterizationfunctionality • Realismbytexturing, e.g: SGI Infinite Reality • 98: Geometryprocessor (T&L) on PC-Graphics • 2000: PC-Graphics achievessimilarperformanceto High-End-Workstations • 3D isbecomingstandard in Aldi-PC • 2001: PC-Graphics offersnewfunctionality • Multitextures, Vertex- andPixel-Shader • 2002: DirectX Level 9.0 Hardware • High Level ShaderLanguages • 2006: DirectX Level 10.0 Hardware • Geometry – Shader
Trends in graphics hardware Numberoftransistorsdoublesevery 6 months Advances in performanceandfunctionality ATI R520 300 GeForceFX / ATI Radeon 9800 150 60 50 GeForce3 (57M) R200 (60M) 40 30 Transistors (Mi) Riva 128 (3M) 20 10 0 Time (month/year) 9/97 3/98 9/98 3/99 9/99 3/00 9/00 3/01 9/02
Graphics CPU Performance Network Time Trends in graphics hardware • Grows faster than Moore‘s law predicts
Parallel graphics hardware • Graphics hardware has always been parallel • Internal on chip or board • Multiple rasterizer serve one frame buffer • Multi-Pipe • Multiple graphics cards in one system for one or multiple displays • Multiple geometry engines • Distributed graphics • Multiple knots in a connected cluster with one or multiple cards serve one or multiple displays driven by one application
Graphics architectures • State-of-the-Art GPUs • Highly parallel streamarchitecture • Stream ofvertices/fragmentsisprocessed • Pipelinedand SIMD parallel processing • SIMD: singlesetofinstructions on multiple streamelements • Specifiesnewrenderingpipeline • Additional stages a vertexor a fragmentispassingthrough • Specifiesnew (vendorspecific) OpenGLextensions • Allowsfornewclassesofalgorithms • Eventuallymakesprogramsplatformdependent
Graphics architectures State-of-the-Art GPUs (G80)
Graphics architectures • State-of-the-Art GPUs • Multiple (texture) render targets • Upto2GB videomemory • Floating pointtextures (4 x 32 Bit) • Internal computations in float /double precision • Z-cull: discardsfragments (beforeenteringthepixelpipelines) that will failthedepthtest • Dynamic flowcontrol: per-vertex/geometry/fragmentspecificoperations (ifthenelse) • PCIe: serial, pont2point protocol, dual channelstoallowforbandwidth in bothdirections (upload/download) • Fix fragment-to-pixelbound, i.e. a fragment (XY) can not bewrittento a pixel (X´Y´) • noscattering(at least not in DX/GL)– onlygathering
Graphics architectures State-of-the-Art programmable GPUs
Graphics architectures State-of-the-Art programmable GPUs
Programmable graphics hardware Displacementmapping Simulation generatesheight field texture static grid water surface Displacer Rendering
Programmable graphics hardware • GPU memory objects • Semantics can be specified for chunk of memory • Memory object can be a texture, a vertex array, a frame buffer object • What was a texture render target in the current pass becomes a vertex array in the upcoming pass • Texture elements can be interpreted as vertex attributes without any copying operations (not in OpenGL) • Same effect can be achieved with vertex texture fetch, but this fetch actually slows down performance
Programmable graphics hardware • Example • Computationofheightvaluesuatverticesof a 2D grid • Startingwith an initialdistribution, computeevolutionover time t y Pij+1 Pi-1j+1 Pi+1j+1 h Pij Pi-1j Pi+1j Pij-1 Pi-1j-1 Pi+1j-1 h x
Programmable graphics hardware Algorithm: • Load initial height values (NxxNy) as 2D texture (sGridPrev, sGrid) • Upload fragment shader (render to sGridNew): voidPerPixelSim ( float2 fragpos: TEXCOORD0, out height : COLOR0) { centerPrev = tex2D(sGridPrev, fragpos); float2 leftIndex = float2(-1.0/TexSize, 0.0); left = tex2D(sGrid, fragpos + leftIndex); // same forright, upper, lower, center height = f(left, right, upper, lower, center, centerPrev); }
Programmable graphics hardware Algorithm contd.: • Simulation: • Render a Quad that covers Nx x Ny pixelswith appropriate texture coords. • Nx x Ny fragments will be generated • Data parallel execution of fragments • Swizzle texture identifiers • sGridPrev = sGrid, sGrid = sGridNew; sGridNew = sGrdPrev • Display height field in texture sGrid (0,1) (1,1) (1,0) (texCoord = 0,0)
Programmable graphics hardware Algorithm contd.: • Display: • Upload fragment shader (render to color buffer): voidPerPixelRefract ( float2 fragpos: TEXCOORD0, out color : COLOR0) { tangent = float3(1.0, 0.0, tex2D(sGrid, fragpos + rightIndex).r - tex2D(sGrid, fragpos).r; binormal = float3(0.0, 1.0, tex2D(sGrid, fragpos + upper).r - tex2D(sGrid, fragpos).r); normal = normalize(cross(tangent, binormal)); refract = f(normal, refractionIndex); color = tex2D(sBackground, fragpos + refract); }
GPU Partikelverfolgung Eingabe Strom VertexShader InputAssembler Rasterizer Ausgabe Strom Output Merger Pixel Shader
Programmable graphics hardware Demonstration