1 / 24

Experiences with Streaming Construction of SAH KD Trees

Experiences with Streaming Construction of SAH KD Trees. Stefan Popov, Johannes Günther, Hans-Peter Seidel, Philipp Slusallek. Motivation. Large speed-up of ray tracing lately Better algorithms (packet tracing [Wald04, Reshetov05] ) Optimized spatial index structures

ismail
Download Presentation

Experiences with Streaming Construction of SAH KD Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Experiences with Streaming Construction of SAH KD Trees Stefan Popov, Johannes Günther, Hans-Peter Seidel, Philipp Slusallek

  2. Motivation • Large speed-up of ray tracing lately • Better algorithms (packet tracing [Wald04, Reshetov05]) • Optimized spatial index structures • Best known: KD trees [Havran00] • Faster hardware • Research concentrated mainly on static scenes • Dynamic scenes • Building – slow for SAH based KD trees • Done in a pre-processing step

  3. Dynamic Scenes Approaches • Embed dynamics in the index structure • Use a two level approach [Wald03] • Fuzzy KD trees [Günther06] • Update index structure • Grids, BVHs and KD tree hybrids • Faster build/update • Lower traversal performance • No efficient approach for KD trees • Rebuild entire KD tree • Need to make it fast • Lazy build

  4. SAH Algorithm • Extract & sort events in advance • Abstract objects with AABBs • Events given by AABB boundaries • Recursive top-down construction • Find split plane using SAH • Compute minimum cost • Distribute objects to children • By distributing the events • Keep them sorted

  5. SAH Cost Function • Piecewise linear • Discontinuities at object boundaries • Evaluate only before opening and after closing event

  6. Distribution Along the Split Axis • Given: event list & split position • Sweep event list and classify • Open event • Before split  label object “both” • After split  label object “right” • Close event • Before split  re-label object “left” • Copy event to corresponding child’s list • Might have to insert new events • Random memory access

  7. Distribution Along the Other Axes • Sweep event lists. Copy event to • Left, if corresponding objectlabeled “left” or “both” • Right, if corresponding objectlabeled “right” or “both” • Look up in object array  Random memory access

  8. Problems of KD Tree Construction • Random memory accesses • Expensive cost function evaluation • Initial sorting – inefficient for lazy builds

  9. Streaming Algorithm Overview • Work with unsorted lists of AABBs • Avoid initial sorting • Sweep list once to locate initial split plane • In a single sweep • Distribute objects (straightforward) • Determine split positions of children • Once data fits in caches, switch to conventional build

  10. SAH Cost Estimation • Cost function typically varies only slowly • No need to evaluate SAH at every event  Use sampling! • Naïve approach • For every event: check all samples  O(kN) • How to sample efficiently?

  11. Efficient Sampling • Two step approach • #Objects to left of sample = #Opening events to its left • #Objects to right of sample = #Closing events to its right • Count opening/closing events between samples • Regular sampling  index computation in O(1) • Reconstruct left/right object counts at samples • Using two partial sums from left and rightO(k+N)

  12. Refining of Samples • SAH – sum of two monotone functions – Cl and Cr • Cost between two samples a < b is bounded from below • C  Cmin = min(Cl) + min(Cr) = Cl(a) + Cr(b) • Resample areas where Cmin < current minimum • Typically only few intervals need to be re-sampled (< 5%)

  13. Algorithm properties • Streaming memory accesses • SAH cost function estimated by sampling • No initial sorting required • Refining of Samples

  14. Improvements • Conventional Algorithm • Use radix sort – O(N) • Fastest algorithm if data set fits into caches • No need to order events at same position • Count opening/closing events instead • Removes one radix sort pass • Multiple cores  parallelize build • Most time spent in the lower tree levels • One sub-tree  one core

  15. Results • Speed-up up to 50% • Only effective in the upper levels • Limited by copying of object/events • The larger the scene, the higher the speedup • Performance independent of triangle order • Small decrease in traversal performance (< 2%) • With 1024 samples • Multi-threading • 2.43x @ 4 cores (no local memory management)

  16. Future Work • Fully multi-threaded implementation • Carefully memory management on NUMA architectures • Extend to other spatial index structures • BVHs, BKD trees, SKD trees, …

  17. Conclusion • Streaming construction algorithm • 50% speedup • Cost function sampling • Very low quality degradation • Refining of samples

  18. Thank you!

  19. Advantages • Sequential memory access in the upper levels • Small data foot print in conventional build • Fits in caches • Radix sort is efficient • Less computations needed for split plane position estimation • But, what about the tree cost?

  20. Memory Managment • Use two arrays and alternate them

  21. SAH tree cost • Optimal KD tree for ray tracing • SAH based • Minimize average expected traversal cost of an arbitrary ray

  22. SAH computation • Efficient computation – extract & sort events in advance • Compute incrementally. Keep track of objects on left/right • Evaluate after close, before an open events

  23. Alternative Multi-Threading • required on NUMA architectures) • Sub-tree  core not suitable for the first log(#cores) levels • Also unsuitable for some architecture (Cell) • Alternative • Bring data to cores from sequential pages • Gather event counts in bins at each core • Merge counts before actual cost evaluation

  24. Extension: Multi-Threading • Multiple cores  parallelize build • Most time spent in the lower tree levels • One sub-tree  one core

More Related