270 likes | 511 Views
On a Few Ray Tracing like Algorithms and Structures. . Ravi Prakash Kammaje Swansea University. Ray Tracing. Naïve method Intersect every ray against every triangle O (rays * trs ) Need better methods. Data Structures. BSP Tees. Uniform Grid. Octree. Bounding Volume (Box) Hierarchy.
E N D
On a Few Ray Tracing like Algorithms and Structures. Ravi Prakash Kammaje Swansea University
Ray Tracing • Naïve method • Intersect every ray against every triangle • O (rays * trs) • Need better methods
Data Structures BSP Tees Uniform Grid Octree Bounding Volume (Box) Hierarchy
Kd-trees • A specialised BSP Tree • Axes restricted to X, Y and Z axes • Among most widely used for ray tracing • SAH • Heuristic to build trees suitable for Ray Tracing • Cheap Traversal
RBSP Trees • Form of BSP tree • Space partitioning • Binary – 2 children at each node • Predetermined axes • Number of axes, m • Axes • Construction and Traversal • Similar to kd-trees • Heuristics from kd-tree borrowed
RBSP Trees - Example kd-tree RBSP tree, 24 axes
RBSP Trees - Construction • Predetermine Axis • Methods to predetermine m axes • Evenly spaced points on Sphere • Find evenly spaced points on unit sphere • Use vector from centre to points as axes • Advantage • Has an even distribution of axes • Disadvantage • Axes are not customised to scene
Construction • Recursive process • Find bounding volume • At each node • Find a split plane • Use a heuristic • Classify triangles • Continue until • very few triangles are in node • A maximum depth is reached • Split Plane Selection • Use SAH over all axes • Select plane with minimum cost
RBSP Trees - Traversal • Standard slabs method • Over m planes • Find intersection of ray and plane • Precomputes divides • Number of divide operations = m • If m is large, divide operations cause slowdowns • Use SSE to perform 4 divides • Accelerates ray tracing
RBSP Trees - Results • Makes RBSP trees faster than kd-trees • A structure that shows Ray tracing potential • Better than kd-trees for models with non-axis aligned scenes • Needs better heuristics to predetermine axes
Row Tracing • Combines rasterization and ray tracing concepts • A form of Packet ray tracing – Packets of rays spanning an entire row • Row can be • A 2D plane • Simpler traversal • Easy row / triangle intersection – per-pixel cost less than ray / triangle intersections • A 1D line – Simplifies clipping, occlusion testing and frustum testing
Row Tracing - Algorithm • High level algorithm • Traverse row-plane through kd-tree or octree • Rasterize leaf node triangles with scanline algorithm • Very similar to Ray tracing • Early ray termination not possible • Use 1D Hierarchical Occlusion Maps to achieve this
Row Tracing – Hierarchical Occlusion Maps • Important optimization • Indicates already occluded parts of a Row • 1D version of HOM by Zhang, et al. (1997) • Lowest level – 1 pixel • Each upper level – 2 bits of lower level • For a row with 1024 pixels, lowest level – 128 chars • Entire HOM – 256 chars
Row Tracing – Hierarchical Occlusion Maps • Initialize prior to traversal • Set all bits to zero • The entire row is unoccluded • Updating the HOM • Triangles rasterization • Corresponding lowest level bits are set to 1 • Upper levels updated if necessary • Testing for Occlusion • Skip occluded nodes • Optimize rasterization
Packet Row Tracing • Row-Packet / Node intersection • Case 1 – All rows in packet hit the node • Case 2 – Row packet misses node • Case 3 – Divergence nodes – Trace individual rows from these nodes • Occlusion testing – Test each row individually • Leaf node – All rows are rasterized with leaf node’s triangles • Easily multi threadable
GPUs • Very Powerful • Highly Parallel • Example • NvidiaGeForce GTX 285 • 240 cores • 648 MHz Graphics Clock • 1476 MHz Processor Clock • 1 GB GDDR3 SDRAM • General Purpose on graphics hardware is getting popular
GPU based Algorithms • GPUs are much faster at doing parallel tasks • However, simple tasks require special algorithms to effectively utilise this • Example • Scan of an array – Find sum of all previous elements in the array • Input : {3,7,1,5,8,2,8,1,8,6,2,8} • Output : {3,10,11,16,24,26,34,35,43,49,51,59}
GPU based Algorithms • On CPU for(i=1; i < num; ++i) arr[i] = arr[i]+arr[i-1]; • On GPU • Use parallel scanning algorithm • Make use of several threads • Each element finds sum of itself and element at an offset
GPU Algorithm – Parallel sum • Same number of threads as number of elements in array • Offset = 1 => Each thread finds • sum of itself and it’s neighbouring element • Double the offset • Iterate until offset < number of elements • Can be optimised further by using blocks of threads and intermediate results
Fast ray sorting and breadth-first Packet Traversal for GPU ray tracing - Garanzha and Loop • Sort rays on the GPU • Generate a hash code for each ray based on • Direction of ray • Origin of ray • If rays have same hash code • Considered coherent • Sorted into bins • Each bin has < maxSize rays • Compression, Sorting, Decompression scheme • Utilises GPU efficiently • Create frustum for each bin • Breadth first traverse a BVH of triangles
OpenCL • Based on C • Framework for developing heterogenous applications • In theory • Some parts can be run on GPU • Some on CPU • Initially developed by Apple
OpenCL – early impressions • Still very early • Complex code • Runs on both CPUs and GPUs • Potentially easier to debug on CPUs prior to porting to GPUs • Can allocate work based on suitability • Runs on NVIDIA and AMD / ATI cards • CUDA • much easier to program • Much cleaner code • Not cross platform • Only on NVIDIA GPUs
Conclusion • A few ray tracing like structures and algorithms • RBSP Trees • Row Tracing • Brief summary of GPU Algorithms • Parallel scan • Ray tracing by ray sorting – Garanzha and Loop • OpenCL