1 / 32

Kd -Jump

Kd -Jump. A Path-Preserving Stackless Traversal for Faster Isosurface Ray tracing on GPUs. David Meirion Hughes. Ik Soo Lim. Bangor University, UK. Problem Setting and Previous work. Problem Setting. Problem Setting: Ray Tracing. Tracing rays from camera Find the intersections

Download Presentation

Kd -Jump

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Kd-Jump A Path-Preserving Stackless Traversal for FasterIsosurface Ray tracing on GPUs. David MeirionHughes. IkSoo Lim. Bangor University, UK.

  2. Problem Setting and Previous work Problem Setting

  3. Problem Setting:Ray Tracing • Tracing rays from camera • Find the intersections • Avoid uninteresting areas • Acceleration structure • Division of space • Requires ray traversal

  4. Problem Setting: Traversal of Kd-Trees tnear tfar node* • Downward Traversal • Two branch choices • Remember furthest • Traverse nearest • Test for intersections • If branch had no hit? • Traversal Restore • Go back to other branch Ray Stack 0.5 0.75 0x1...

  5. Problem Setting: GPUs • Several MPU’s • Parallel execution • Kernels • Thousands of threads • light-weight code • On-chip memory very fast • On-board memory slow

  6. Problem Setting: Ray tracing on GPUs One Memory Transaction Two Memory Transactions Three Memory Transactions Ray 1: tnear tnear tnear tfar tfar tfar node* node* node* • Stack – still a problem? • Memory Size • Coalesced Access • One Stack element: • Ray Segment • Node address/look-up • times depth-of-tree • times ray-count • One kernel call Ray 2: Ray 3: ...

  7. Previous Work:Stackless Traversal Ray 1: tnear tnear tnear tfar tfar tfar node* node* node* • Avoid using stack • Current thinking • Less memory • No global memory use • Faster Ray 2: Ray 3: ...

  8. Previous Work:Stackless Traversal Tested Twice • Avoid using stack • Kd-Restart • Restart From Root • + Very little memory • - Revisits previous nodes • - Longer thread life • - Exacerbates incoherence

  9. Previous Work:Stackless Traversal Tested Twice • Avoid using stack • Kd-Restart • Kd-Backtrack • Backtrack up tree • + Very little memory • + Better than Kd-Restart • - Revisits previous nodes • - Longer thread life • - Exacerbates incoherence

  10. Previous Work:Stackless Traversal (per node) Additional Pointers • Avoid using stack • Kd-Restart • Kd-Backtrack • Ropes • Nodes have neighbour links • + Shorter ray life • - Lots of extra memory

  11. Motivation and Description Kd-Jump

  12. Kd-Jump:Motivation • Goal: • Same path as Stack method • Least-amount memory • How: • Indices rather than pointers • Down traverse with equation • Return using inverse • Binary bits for return markers

  13. Kd-Jump:Index Reference • Each node reference by index • x, y, z, etc... • depth Level Memory Blocks [x,y,z] memory map

  14. Kd-Jump:Method Description • Traversal into children • Update an index element • Determined by the split dimension • Multiply by 2 • Add child offset f C = 2x + f [x,y,z] x-dimension split [2x+f,y,z] f=0 f=1

  15. Kd-Jump:Method Description • Traversal back to parent • Apply inverse of downward step • Can replace f with floor function • Do not need to consider what f was • f = 0 or 1, only (C-f)/2 = x floor(C/2) = x [floor(x/2),y,z] x-dimension split [x,y,z]

  16. Kd-Jump:Method Description • Traversal to common parent • Apply inverse on all indices • Divide elements by power of 2 • Number of splits • Matrix of Split information • Store in constant memory • (cached) • (alternative) Store on the fly 0 0 1 0 1 1 floor(x/21), floor(y/22), ... 2 2 2 1 1 0 1 2 2 2 [x,y,z]

  17. Kd-Jump:Method Description • Determine jump amount • Mark common parents • 1 bit • Store in MSB order • On return • Count right-trail zero bits • This is the return depth • Subtract from current depth • Jump amount 32-bit Register 0 1 0 0 0

  18. Kd-Jump:Method Description • Re-clip Ray • Bounds stored or computed Bound X Bound Y

  19. Kd-Jump:Scope • Nodes referenced with indices • Traversal equations invertible • Forget route choices in inverse • Index-to-memory map • Limit wasted memory • Balanced kd-tree • implicit kd-tree • Requires node bounds • Re-computed with implicit kd-tree

  20. Kd-Jump:Isosurfacing with implicit Kd-tree • Wald’s implicit Kd-tree • min/max of node branch • left-balanced • No-waste memory map • Bounds/splits computed

  21. Implementation:Isosurfacing with implicit Kd-tree • Minor differences • Node test • Test prior to traversing • Reduces number of returns • Stack, kd-jump, kd-restart

  22. Results:Isosurfacing with implicit Kd-tree • Kd-Jump faster • Ray time-active important (kd-restart) • Stack only slightly slower • High occupancy (75%) • High ray coherence = automatic coalesced access Frames Per Second. Average across multiple iso/view

  23. Results:Isosurfacing with implicit Kd-tree • All use one 32-bit register • stack_size, tfar_max, depth_flags • Stack memory allocated for all rays • Single kernel • Constant memory as fast as registers • Once data cached. Memory Use

  24. Analysis:Kd-Jump • Theoretical performance. • Memory access not hidden • However, perfectly coalesced.

  25. Analysis:Kd-Jump • Bottlenecks • Stack memory bound • Kd-Jump computation bound

  26. Hybrid Kd-tree:Exploiting Texture caching • Build implicit tree • Depth threshold • Volume stepping • Texture cache • Very fast • Threshold depth? • intersection method • Iso-surface • View direction

  27. Hybrid Kd-tree:Results Frames Per Second. Average across multiple view

  28. Conclusions • Kd-Jump • Stackless • Index based • Immediate backtrack to common parent • No dependency on ray coherency • At least if bounds can be computed • Hybrid Kd-Jump • Texture cache over Acceleration Structure • Variable depth threshold of branches • View, intersection method, iso-value.

  29. Conclusions • Future prediction • Memory access and speed improving • Current trend • Usefulness of Stackless • Reduced memory cost • Reduce dependency on coherency • Less iterations (Ropes) than stack • Big question: One kernel, verses many? • Stack favours one kernel. • i.e., no reorganising (can break coalesced access) • Can organise into groups of same depth though? • Many kernels = better device occupancy • Memory access better hidden

  30. Future Work • Kd-Jump with General Kd-Tree’s? • Real-time explicit from implicit • CUDA 3.0 • Dynamic Warps? • Ideal for ray tracing? • Inter-device communication?

  31. Addendum:Indices for general case kd-trees? • Nodes need bounds for re-clip • Accept the cost? • Compute them somehow? • BVH stores them anyway • Memory Map • Very difficult to remove wasted space • Feasible to minimise waste?

  32. Questions

More Related