450 likes | 614 Views
Ray Tracing on Programmable GPUs. Application. Command. Geometry. Rasterization. Texture. Fragment. Display. Graphics Pipeline. Fragment Input. Textures. Fragment Program. Registers. Fragment Output. Traditional Pipeline. Programmable Fragment Pipeline.
E N D
Application Command Geometry Rasterization Texture Fragment Display Graphics Pipeline Fragment Input Textures Fragment Program Registers Fragment Output Traditional Pipeline Programmable Fragment Pipeline
Fragment Processing Features • Rich instruction set • No branching yet (see PS 3.0 spec) • Floating point • Arithmetic • Texture memory • Dependent texturing • Multipass rendering flow control • NV_OCCLUSION_QUERY
Ray Engine – Main Idea • Ray-traingle intersection done by GPU • CPU-based renderer does everything else
Ray Engine Algorithm • Renderer sends ray textures to GPU • Ray origin and direction • Renderer sends ‘triangles’ down pipeline • Vertex interpolants of a screen aligned quad • GPU performs ray-triangle intersection tests • Short fragment program • Framebuffer stores closest hit point • Renderer reads back closest hit
Pixel Shader 1.4 Implementation Fixed Point Precision Problems
Ray Engine Results • Radeon 8500 fixed point implementation • 114 M ray-triangle intersections / s • Full precision simulator • 115K – 200K rays / s
Ray Engine Summary • GPU performs ray-triangle intersection • CPU-based renderer does everything else • Raw ray-triangle intersection rate is faster than CPU based approach • Total rays processed per second is slower than CPU • Readback limited
Streaming Ray Tracer – Main Ideas • Entire ray tracing computation can be done efficiently on the GPU • Minimal host interaction • Stream processor abstraction for programmable fragment processor
Streaming Ray Tracer Generate Eye Rays Camera Traverse Acceleration Structure Grid Intersect Triangles Triangles Shade Hits and Generate Shading Rays Materials
GPU Abstraction • Texture memory is memory • Think of dependent texture fetches as pointer dereferencing • Programmable fragment processor is a programmable stream processor • Think of multipass rendering as stream and kernel programming
Texture Memory Organization Uniform Grid 3D Luminance Texture vox0 vox1 vox2 vox3 vox4 vox5 voxM 0 3 11 38 … 564 Triangle List 1D Luminance Texture vox0 vox2 0 3 1 3 7 21 216 … tri0 tri1 tri2 tri3 tri4 tri5 triN Triangles 3x 1D RGB Textures xyz xyz xyz xyz xyz xyz … xyz v0 v1 xyz xyz xyz xyz xyz xyz … xyz xyz xyz xyz xyz xyz xyz … xyz v2
input record stream kernel globals kernel globals output record stream Stream Programming Model Programmable fragment processor is essentially a stream processor • Kernels and streams • Stream is a set of data records • Kernels operate on records • Streams connect kernels together • Kernels can read global memory
Streaming Flow Control Application and Geometry Stages Rasterization Fragments (Input Stream) Fragment Program (Kernel) Texture (Globals) Fragment Program Output (Output Stream)
Multiple Rendering Passes Pass 1 Generate Eye Rays Draw quad Rasterize
Multiple Rendering Passes Pass 1 Generate Eye Rays Run fragment program
Multiple Rendering Passes Pass 1 Generate Eye Rays Save to offscreen buffer (rays)
Multiple Rendering Passes Pass 2 Traverse Draw quad Rasterize
Multiple Rendering Passes Pass 2 Traverse Run fragment program Restore (rays)
Multiple Rendering Passes Pass 2 Traverse Save to offscreen buffer (ray voxel pr)
Demos Rendered using a Radeon 9700 Pro
Demos Rendered using a Radeon 9700 Pro
Demos Rendered using a Radeon 9700 Pro
Demos Rendered using a Radeon 9700 Pro
Streaming Ray Tracer Results • Simulations • 50M – 200M ray-triangle intersections/s • Radeon 9700 Pro Implementation • 100M ray-triangle intersections/s • 300K – 4.0M rays/s
Streaming Ray Tracer Summary • Entire ray tracing computation can be mapped efficiently to the GPU • Stream processor is a good abstraction for a programmable fragment processor
Ray Tracing in Hardware • Volume Rendering • [Meissner98],[Pfister99] • Offline Rendering • [ART01],[ART02] • Interactive Rendering • [Schmittler02]
SaarCOR – Main Idea • Scalable and efficient real time hardware ray tracer • Implementation based on Saarland RTRT
SaarCOR Implementation • Packet based ray tracer • Several custom cores • Computational units • Traversal, intersection, ray generation and shading • Memory units • Memory controller, caches, routers • Multithreaded • Standard DRAM memory on board • Virtual memory support for large scenes • Support for programmable shading
Simulated Performance 137 fps 59 fps Standard 4-pipeline SaarCOR 100M – 400M rays/s 44 fps 170 fps
Simulated Bandwidth Usage No VMA With VMA PCI 1.9MB 2.5MB 0.03MB 26.6MB 34.1MB 0.91MB 2.1MB 2.6MB 0.02MB 6.1MB 7.7MB 0.14MB
SaarCOR Summary • Scalable and efficient • Requires fewer FP units than GeForce3 • Low bandwidth requirements • Hides latency through multithreading • Fast frame rates
Conclusions • Real time ray tracing advantages • Physically correct renderings • High geometric complexity • Shading flexibility • Several options for real time ray tracing • Software, GPU, Hardware
Acknowledgments • Ian Buck, Bill Mark, Pat Hanrahan • James ‘RTD’ Percy, Pradeep Sen, Eric Chan • Matt Papakipos, Kurt Akeley - NVIDIA • Bob Drebin, Mark Peercy – ATI • Sponsors • ATI, MERL, NVIDIA, Sony, Sun • DARPA