350 likes | 494 Views
Ray Tracing in CUDA. Andrei Monteiro Marcelo Gattass Assignment 3 June 2010. Topics. Motivation Related Work Grid Construction Ray Tracing in CUDA Results Conclusion References. Motivation.
E N D
Ray Tracing in CUDA Andrei Monteiro Marcelo Gattass Assignment 3 June 2010
Topics • Motivation • Related Work • Grid Construction • Ray Tracing in CUDA • Results • Conclusion • References
Motivation • Ray Tracing is a technique for generating an image by launching rays for each pixel and calculating its intersections with the scene objects. • Simulates several effects naturally, such as reflection and refraction, producing a very high degree of virtual realism. • Computationally expensive. • Can use different acceleration structures. (Kd-trees, Uniform grid, BVH) • Why CUDA? • Designed for General-Purpose Computing. • Construction of Grid is faster than other structures (e.g. Kd-trees, BVH). • Provides natural compactness, avoiding memory waste in contrast with the stencil routing algorithm using GLSL. • Use of shared memory speed up construction. • Fast data transfers. • Atomic operations
Related Work • Uniform Grid • A Parallel Algorithm for Construction of Uniform Grids, Kalojanov, J. • GPU-Accelerated Uniform Grid Construction for Ray Tracing Dynamic Scenes, Ivson, P., Duarte, L., Celes, W. • Ray Tracing • Understanding the Efficiency of Ray Traversal on GPUs, Aila, T. NVIDIA. • Ray Tracing on Programmable Graphics Hardware, Purcell, T. • Ray Tracing Animated Scenes using Coherent Grid Traversal, Wald et al.
Grid Construction • Uniform Grid • Speed Up simulation, avoids going through every scene object to test intersection. • Supports Dynamic Scenes. • Each voxel contains a list of primitives • The ray traverses the grid. • Grid Resolution
Grid Construction • Algorithm in CUDA: • Insert triangles in voxels • Calculate grid hash table • Sort the pairs • Write cell start and end • Reorder particles
Grid Construction • Insert triangles in voxels • Bounding Box (corners) • Check triangle plane Intersection • Avoids more than the same reference of the triangle inside the voxel. Contained in same voxel 4 times Contained in more than one voxel
Grid Construction 9 3 5 • Grid Hash Table • Pair Cell Index – Particle Index. • E.g. Cell Dimension = 3, Grid resolution = 3x3 6 7 8 6 0 2 4 3 4 5 3 7 1 6 8 0 1 2 0 0 3 6 9 PARTICLES: 1 2 0 3 4 5 6 7 8 4 4 0 1 2 0 8 4 0 5 7 0 0 3 0 8 8 0 3 1 0 8 1 0 7 1 0 HASH: 4 0 5 7 3 8 1 2 2 Cell Index 0 1 2 3 4 5 6 7 8 Particle Index
Grid Construction • Sorting the Pairs • In order to calculate the cells´start and end, it is necessary to order particles in respect to cell indices which they belong. • Actually, the application sorts the previous hash table with respect to their keys, or cell indices. • Use of Radix Sort from CUDA SDK.
Grid Construction • Sorting the Pairs • Sort Hash table by key values (cell indices). HASH: 4 0 5 7 3 8 1 2 2 Cell Index 0 1 2 3 4 5 6 7 8 Particle Index Sorted HASH: 0 1 2 2 3 4 5 7 8 Cell Index 6 1 7 8 4 0 2 3 5 Particle Index
Grid Construction • Finding Cell Start/End and Reordering Particles. Sorted HASH: 0 1 2 3 4 5 6 7 8 ... Current Thread 0 0 1 1 1 2 2 2 1 Cell Index CellStart [CellIndex[thread_id]] = 2 ≠ Cell 0: end = 2 CellIndex [thread_id - 1] CellIndex [thread_id] CellEnd [Cell Index [thread_id -1]] = 2 Cell 1: start = 2 Cell Start/End: 0 1 2 3 4 5 6 7 8 ... Cell Index 0/2 2/6 6/... Cell Start / End
Ray Tracing in CUDA • Can be easily parallelized • Each thread is responsible for one pixel / ray intersection. • Problems that slow performance: • Internal Loops • Cause threads to diverge. • Random memory access • Causes bank conflicts, non-coalesce reading
Ray Tracing in CUDA • Kernels • Build grid (if scene changes) • Setup rays (if camera moved) • Lauch Rays • Get Hits • Get Shadow Hits • Get Reflection Hits (repeat) • Shade
Ray Tracing in CUDA • Setup Rays • Calculate the ray equation for each pixel. • One thread per pixel
Ray Tracing in CUDA • Lauch Rays • Calculates the ray-grid intersection, if any. For rays that do not intersect the grid, they are discarded for the next steps. • Returns the first cell intersection and the parameters for traversing the grid. p(t)
Ray Tracing in CUDA • Get Hits • The most expensive steps of the simulation. • Typical algorithm: • Problem: Causes too much thread divergency. • Solution: Use while-while algorithm • Causes less divergency While (not hit or ray inside grid) { Traverse cell; if (! Cell empty) { for each triangle in cell { get hit (); } } }
Ray Tracing in CUDA • while- while algorithm while-while trace(): while ray not terminated while node does not contain primitives traverse to the next node while node contains untested primitives perform a ray-primitive intersection test
Ray Tracing in CUDA • Get Hits • Traversal Algorithm: 3D DDA if (nextx < nexty) nextx += deltax X += 1; else nexty += deltay Y += 1; process_grid(X, Y);
Ray Tracing in CUDA • Get Hits • Triangle Intersection • Möller • Enables face culling. • Greatly increased performance • Careful: • Triangles can be in more than one voxel, so it´s necessary to check if the intersection point is in the current voxel.
Ray Tracing in CUDA • Increase efficiency • The internal loops make threads diverge and thus lower performance. • To contour this problem, NVIDIA researcher T. Aila included a method called Persistent Threads in CUDA. • The idea is to keep threads busy while at least one of them is not done. • Increased performance depends on the GPU. • 9800 GX2: 2.2x increase • GTX 480: 3.0x increase
Ray Tracing in CUDA • Persistent threads implementation code
Ray Tracing in CUDA • Shade • Linear Interpolation using baricentric coordinates • Normal • Texture • Texture • Used CUDA 3D Texture to support variable number of scene textures. • Phong Shading
Results • Real-Time Ray Tracing Performance • Depends on: • Grid resolution • Number of primitives • Camera in/outside grid • Shadow Pass • Reflection Passes (1 or more times) • Scenes with reflections and many primitives vary about 20~30 fps
Conclusion • The user was able to replicate physical effects. • CUDA is slower compared to other languages (e.g. GLSL) if not optimizing and use its maximum optimization resources. • There are still several optimizations pending in this work. • Math • CUDA threads and kernels • Too much memory used
References • Kalojanov, J. A Parallel Algorithm for Construction of Uniform Grids. High Performance Graphics, 2009. Retrieved in Apr 21 2010. • Ivson, P., Duarte, L., Celes, W.,GPU-Accelerated Uniform Grid Construction for ray Tracing Dynamic Scenes. • Understanding the Efficiency of Ray Traversal on GPUs, Aila, T. NVIDIA Research. Retrieved May 23, 2010. • Ray Tracing on Programmable Graphics Hardware, Purcell, T. Stanford University. Retrieved May 28, 2010. • Ray Tracing Animated Scenes using Coherent Grid Traversal, Wald et al. SCI Institute, University of Utah. Retrieved May 25, 2010 • NVIDIA CUDA Programming Guide. V. 2.0, 2008. Retrieved Mar 29, 2010.