280 likes | 788 Views
KD-Tree Acceleration Structures for a GPU Raytracer. Tim Foley, Jeremy Sugerman Stanford University. Motivation. Accelerated raytracing On commodity HW Production rendering Real-time applications? Performance trend 9800 XT : 170M ray-triangle intersects/s
E N D
KD-Tree Acceleration Structures for a GPU Raytracer Tim Foley, Jeremy Sugerman Stanford University
Motivation • Accelerated raytracing • On commodity HW • Production rendering • Real-time applications? • Performance trend • 9800 XT : 170M ray-triangle intersects/s • X800 XT PE: 350M ray-triangle intersects/s
GPU Raytracing • Promising early results • Simple scenes • Uniform grid • Problems with complex scenes • Hierarchical accelerator (kd-tree) • Improve scalability
Outline • Background • GPU Raytracing • KD-Tree Algorithm • KD-Restart, KD-Backtrack • Results • Future Work
Background • RayEngine [Carr et al. 2002] • Parallel ray-triangle intersection • Host controls culling • [Purcell et al. 2002] • Entire raytracing pipeline • Many rays required for efficiency • Uniform Grid
Why not KD-Tree? • Uniform grid acceleration structure • Regular structure = efficient traversal • Regular structure = poor partitioning • KD-Trees • Adapt to scene complexity • Compact storage, efficient traversal • “Best” for CPU raytracing [Havran 2000]
X Y Z A C B D KD-Tree tmin Z X B Y D C A tmax
Z X B X Y Y Z D C A A C B D KD-Tree Traversal
Per-Fragment Stacks • Parallel (per-ray) push • No indexed write in fragment program • Per-ray stack storage • [Ernst et al. 2004] • Emulate push with extra passes • Impractical, slow
Our Contribution • Stackless kd-tree traversal algorithms • KD-Restart • KD-Backtrack
Z X B X Y Y Z D C A A C B D Observation Current leaf’s tmax Next leaf’s tmin =
Z X B Y D C A KD-Restart • Standard traversal • Omit stack operations • Proceed to 1st leaf • If no intersection • Advance (tmin,tmax) • Restart from root • Proceed to next leaf
KD-Restart • Restart traversal after each leaf • m leaves • Average depth d • Cost O(m*d) • Balanced tree of n nodes • Upper bound: O(n log(n)) • Standard algorithm: O(n) • Expected: O( log(n) )
Z X B X Y Y Z D C A A C B D Observation Ancestor of A isparent of Z
Z X B Y D C A KD-Backtrack • If no intersection • Advance (tmin, tmax) • Start backtracking • If node intersects (tmin, tmax) • Resume traversal • Proceed to next leaf
KD-Backtrack • Backtrack after leaf • Revisits previous nodes • At most twice: from left, right • Within constant factor of standard traversal • Upper bound: O(n) • Expected: O( log(n) ) • Requires additional storage • Parent pointers • Bounding boxes for internal nodes
Implementation • Built GPU raytracer in Brook [Buck et al.] • 4 intersection schemes: • Brute Force • Uniform Grid • KD-Restart • KD-Backtrack
Scenes Stanford Bunny 69451 triangles Cornell Box 32 triangles BART Robots 71708 triangles BART Kitchen 110561 triangles
Results Box Bunny Robots Kitchen 12.9 Relative speedup over brute-force intersection.
Results Rays in each state throughout traversal.
Discussion • Absolute performance • Trails best CPU implementations 5-6x • Sources of inefficiency • Load balancing • Data reuse
Load Balancing • Subset of rays intersecting, traversing • Occlusion queries to select kernel • Early-Z to cull inactive rays • Approximately 5x overhead • Query, kernel switch overhead • Worse with fewer rays
Data Reuse • Every kernel • Loads ray origin/direction • Load/Store traversal state • Consumes streaming bandwidth • We are bandwidth-limited • CPU implementation stores these in registers
Branching • Merge multiple passes into larger kernel • Fragment branches for load balancing • Avoid load/store of reused data • Current branching has high overhead • Shifts efficiency burden to HW
Conclusion • Stackless Traversal • Allows efficient GPU kd-tree • Scales to larger, more complex scenes • Future Work • Changes in HW • Alternative acceleration structures • “Out-of-core” scenes • Dynamic scenes
Acknowledgements • Tim Purcell (NVIDA) • Streaming raytracer • Mark Segal (ATI) • Demo machine • NVIDIA, ATI : HW • DARPA, Rambus : Funding