240 likes | 383 Views
Experiences with Streaming Construction of SAH KD Trees. Stefan Popov, Johannes Günther, Hans-Peter Seidel, Philipp Slusallek. Motivation. Large speed-up of ray tracing lately Better algorithms (packet tracing [Wald04, Reshetov05] ) Optimized spatial index structures
E N D
Experiences with Streaming Construction of SAH KD Trees Stefan Popov, Johannes Günther, Hans-Peter Seidel, Philipp Slusallek
Motivation • Large speed-up of ray tracing lately • Better algorithms (packet tracing [Wald04, Reshetov05]) • Optimized spatial index structures • Best known: KD trees [Havran00] • Faster hardware • Research concentrated mainly on static scenes • Dynamic scenes • Building – slow for SAH based KD trees • Done in a pre-processing step
Dynamic Scenes Approaches • Embed dynamics in the index structure • Use a two level approach [Wald03] • Fuzzy KD trees [Günther06] • Update index structure • Grids, BVHs and KD tree hybrids • Faster build/update • Lower traversal performance • No efficient approach for KD trees • Rebuild entire KD tree • Need to make it fast • Lazy build
SAH Algorithm • Extract & sort events in advance • Abstract objects with AABBs • Events given by AABB boundaries • Recursive top-down construction • Find split plane using SAH • Compute minimum cost • Distribute objects to children • By distributing the events • Keep them sorted
SAH Cost Function • Piecewise linear • Discontinuities at object boundaries • Evaluate only before opening and after closing event
Distribution Along the Split Axis • Given: event list & split position • Sweep event list and classify • Open event • Before split label object “both” • After split label object “right” • Close event • Before split re-label object “left” • Copy event to corresponding child’s list • Might have to insert new events • Random memory access
Distribution Along the Other Axes • Sweep event lists. Copy event to • Left, if corresponding objectlabeled “left” or “both” • Right, if corresponding objectlabeled “right” or “both” • Look up in object array Random memory access
Problems of KD Tree Construction • Random memory accesses • Expensive cost function evaluation • Initial sorting – inefficient for lazy builds
Streaming Algorithm Overview • Work with unsorted lists of AABBs • Avoid initial sorting • Sweep list once to locate initial split plane • In a single sweep • Distribute objects (straightforward) • Determine split positions of children • Once data fits in caches, switch to conventional build
SAH Cost Estimation • Cost function typically varies only slowly • No need to evaluate SAH at every event Use sampling! • Naïve approach • For every event: check all samples O(kN) • How to sample efficiently?
Efficient Sampling • Two step approach • #Objects to left of sample = #Opening events to its left • #Objects to right of sample = #Closing events to its right • Count opening/closing events between samples • Regular sampling index computation in O(1) • Reconstruct left/right object counts at samples • Using two partial sums from left and rightO(k+N)
Refining of Samples • SAH – sum of two monotone functions – Cl and Cr • Cost between two samples a < b is bounded from below • C Cmin = min(Cl) + min(Cr) = Cl(a) + Cr(b) • Resample areas where Cmin < current minimum • Typically only few intervals need to be re-sampled (< 5%)
Algorithm properties • Streaming memory accesses • SAH cost function estimated by sampling • No initial sorting required • Refining of Samples
Improvements • Conventional Algorithm • Use radix sort – O(N) • Fastest algorithm if data set fits into caches • No need to order events at same position • Count opening/closing events instead • Removes one radix sort pass • Multiple cores parallelize build • Most time spent in the lower tree levels • One sub-tree one core
Results • Speed-up up to 50% • Only effective in the upper levels • Limited by copying of object/events • The larger the scene, the higher the speedup • Performance independent of triangle order • Small decrease in traversal performance (< 2%) • With 1024 samples • Multi-threading • 2.43x @ 4 cores (no local memory management)
Future Work • Fully multi-threaded implementation • Carefully memory management on NUMA architectures • Extend to other spatial index structures • BVHs, BKD trees, SKD trees, …
Conclusion • Streaming construction algorithm • 50% speedup • Cost function sampling • Very low quality degradation • Refining of samples
Advantages • Sequential memory access in the upper levels • Small data foot print in conventional build • Fits in caches • Radix sort is efficient • Less computations needed for split plane position estimation • But, what about the tree cost?
Memory Managment • Use two arrays and alternate them
SAH tree cost • Optimal KD tree for ray tracing • SAH based • Minimize average expected traversal cost of an arbitrary ray
SAH computation • Efficient computation – extract & sort events in advance • Compute incrementally. Keep track of objects on left/right • Evaluate after close, before an open events
Alternative Multi-Threading • required on NUMA architectures) • Sub-tree core not suitable for the first log(#cores) levels • Also unsuitable for some architecture (Cell) • Alternative • Bring data to cores from sequential pages • Gather event counts in bins at each core • Merge counts before actual cost evaluation
Extension: Multi-Threading • Multiple cores parallelize build • Most time spent in the lower tree levels • One sub-tree one core