180 likes | 310 Views
Some Things. Jeremy Sugerman 22 February 2005. Topics. Quick GPU Topics Conditional Execution GPU Ray Tracing. PCI-Express. PCI-Express solves data transfer problems…. 3DLabs Realizm 100 AGP. Mediocre Fill Rate (About half a 9800XT) Reasonable Texture Bandwidth
E N D
Some Things Jeremy Sugerman 22 February 2005
Topics • Quick GPU Topics • Conditional Execution • GPU Ray Tracing Jeremy Sugerman, FLASHG 22 February 2005
PCI-Express • PCI-Express solves data transfer problems… Jeremy Sugerman, FLASHG 22 February 2005
3DLabs Realizm 100 AGP • Mediocre Fill Rate (About half a 9800XT) • Reasonable Texture Bandwidth • Variable Cost Instructions • 6 GFLOPS ADD – 0.5 GFLOPS LG2 • Remarkable Readback • But, No GL_TEXTURE_RECTANGLE_EXT Jeremy Sugerman, FLASHG 22 February 2005
Conditional Execution • Depth and Stencil are classic tools • Only effective early • All shaders support predication and KIL • No savings in execution time • KIL does gruesome things to the pipeline • Pixel Shader 3.0 has true branching • If-Then-Else, Data dependent loops • NV4x currently, no ATI until R500 Jeremy Sugerman, FLASHG 22 February 2005
Clear Z to 1.0 Draw Depth-Only at Z = 0.3 KIL where computation will happen Draw Color at Z = 0.7 Very Effective When it Works Fragile, Easily Disabled Stays Disabled Until glClear! Compute Mask – Z Buffer Jeremy Sugerman, FLASHG 22 February 2005
Compute Mask - EarlyZ NV41 X800 Random 2x2 Blocks 3x3 Blocks 4x4 Blocks Wavefront Jeremy Sugerman, FLASHG 22 February 2005
Compute Mask – PS3.0 • Rasterize Normally a shader like: If (pixel is live) { … MOV result.color, <output> } else { MOV result.color, <placeholder> // Or KIL } • Easy to Write • Must shade all fragments • Must write a value or KIL for all fragments Jeremy Sugerman, FLASHG 22 February 2005
Compute Mask – PS 3.0 Random 64x64 Blocks 32x32 Blocks 16x16 Blocks Wavefront Jeremy Sugerman, FLASHG 22 February 2005
Pixel Shader 3.0 • Not (yet?) a replacement for Early-Z • What about loops? • What about state machines? If (fragment is in state a) { // Computation 1 } else { // Computation 2 } • Will execution time be MAX(a, b) or a + b? Jeremy Sugerman, FLASHG 22 February 2005
GPU Ray Tracing • Tim Purcell left us a Brook raycaster • Tim (Foley) et al. beat on it for DARPA Line-of-Sight • Early-Z, 2D Addressing • Tim and I have forked it again • Explore new hardware features • Explore new algorithm options • Mature, maintainable source base Jeremy Sugerman, FLASHG 22 February 2005
Demo • Break for demo… Jeremy Sugerman, FLASHG 22 February 2005
GPU Ray Tracing – Brute Force • Initialize Scene Parameters, Geometry (CPU) • Generate Eye Rays • Foreach( triangle in the scene ) • Intersect with all rays • Record if it hits closer than any prior triangle • Shade Hits • Ray-Triangle kernel is 39 instructions • Over 100 million intersections per second Jeremy Sugerman, FLASHG 22 February 2005
GPU Ray Tracing – Uniform Grid • Initialize Scene Parameters, Geometry (CPU) • Generate Eye Rays • While (Any Rays Are Live) • Traverse the traversing rays • Intersect the intersecting rays • Shade Hits • Equivalent to ~14 million ray-triangles per second on our scenes. Jeremy Sugerman, FLASHG 22 February 2005
“Any Live Rays?” • Fundamentally a reduction • Sum across all rays • Readback to CPU • Many passes to do a GPU reduction • Could try occlusion query • Kernel that just KIL’s on dead rays • Still an extra pass • GPU global counter registers would be cool • Equivalent to 24 million ray-triangles per second when skipped. Jeremy Sugerman, FLASHG 22 February 2005
Ping Ponging Buffers • No read-modify-write causes copies: intersectTriangle(in ray, in oldHit, in tri, out hit) { if (ray hits tri closer than oldHit) { hit = <where ray hits tri>; } else { hit = oldHit; No RMW } • Memory and Bandwidth Hungry • Add conditionals / predication to kernels • Complicates Early-Z compute masking Jeremy Sugerman, FLASHG 22 February 2005
Render to Texture • DirectX has it, OpenGL does not • DirectX raytracer bluescreens NV4x drivers • Every shader draws its results to a pbuffer • Copied back to a texture each time • Superbuffers offered a fix • ATI supported them (broken now) • ARB killed them • Framebuffer Objects made it through the ARB • Only drivers are preliminary NV4x drivers Jeremy Sugerman, FLASHG 22 February 2005
GPU Ray Tracer Enhancements • 2D Addressing (duh) • kD-Tree Accelerator • Early-Z and/or PS3.0 for the Accelerators • Tuning Traverse vs. Intersect vs. Shade • Occlusion Queries / Fast Reductions • Shadows • Tuning Bandwidth • Shading… Jeremy Sugerman, FLASHG 22 February 2005