550 likes | 673 Views
Cache-Efficient Layouts of Bounding Volume Hierarchies (BVHs). Sung-Eui Yoon Lawrence Livermore National Laboratory Dinesh Manocha Univ. of North Carolina at Chapel Hill. Goal. Compute cache-coherent layouts of bounding volume hierarchies (BVHs) For various geometric applications
E N D
Cache-Efficient Layouts of Bounding Volume Hierarchies (BVHs) Sung-Eui Yoon Lawrence Livermore National Laboratory Dinesh Manocha Univ. of North Carolina at Chapel Hill
Goal • Compute cache-coherent layouts of bounding volume hierarchies (BVHs) • For various geometric applications • Handles any kind of BVHs and spatial partitioning hierarchies (e.g., kd-tree)
Bounding Volume Hierarchies (BVHs) • Widely used data structures in: • Ray tracing • Collision detection • Visibility culling Ray tracing Dynamic simulation
Bounding Volumes (BVs) • Axis-aligned bounding boxes (AABBs) • Oriented bounding boxes (OBBs) [Gottschalk et al. 96] • Spheres [Hubbard 93] • Discrete orientation polytopes (k-DOPs) [Klosowski et al. 98] Triangles of a mesh
Layout of BVHs • Nodes (and triangles) of BVHs are stored in arrays • What is a good layout? • How to compute cache-efficient layouts? A 1D Layout of nodes: Layout method B C A B C D E D E
Motivation • Lower growth rate of data access speed 46X Growth rate during 1993 – 2004 20X 1.5X Courtesy: http://www.hcibook.com/e3/online/moores-law/
Memory Hierarchies and Caches Fast memory or cache Slow memory Block transfer Disk CPU 1 sec 10-6 sec 10-4 sec Access time:
Main Contributions • An algorithm computing cache-efficient layouts of BVHs • Probabilistic model • Simple layout construction method • Applicable to spatial partitioning hierarchies
Related Work • Mesh layouts • Layouts of search trees • Layouts of BVHs
Related Work • Mesh layouts • Cache-coherent layouts of meshes and graphs [Yoon et al. 05, Yoon and Lindstrom 06] • Layouts of search trees • Layouts of BVHs Require an input graph that represents access patterns on a BVH
Related Work • Mesh layouts • Layouts of search trees • [Gil and Itai 99, Alstrup et al. 03] • Layouts of BVHs Require a probability function that each node will be accessed
Related Work • Mesh layouts • Layouts of search trees • Layouts of BVHs • Studied in collision detection [Ericson 04] and ray tracing [Havran 97] • Blocking-based layouts [Terdiman 03, van Emde Boas 77]
Outline • Probabilistic model • Layout computation • Results
Outline • Probabilistic model • Layout computation • Results
Traversals of Collision Queries on BVHs • Takes two objects • Two 3D objects for collision detection • One 3D object and one ray for ray tracing BVH2 BVH1
Two Localities • Parent-child locality • Spatial locality
Parent-child Locality B A B A BVH2 BVH1
Spatial Locality D C E C D E BVH2 BVH1
Probabilistic Model • Quantify localities in a uniform way • Measure the probability for localities • Based on geometric relationships between bounding volumes
Probabilistic Model • Pr (n) • Probability that a node, n, will be accessed during runtime traversal • Two major factors • Prob. that p is accessed • Conditional prob. that p is also intersected given g is intersected Intersected b g Accessed and Intersected p n where Xp (or Xg)is a boolean random variable indicating collision between p (or g) and b
Probability Computation • : Conditional prob. that p is also intersected given g is intersected • Do not know any information about b Intersected b g Intersected p n
b Sp∩Sg Contact Space Intersected • Contact space of b against p and g • Denoted as Sp and Sg b g Intersected p n Sp = p Sg = g
Contact Space • Assume b is a sphere • Computed from Minkowski sum • Configuration space, in general • Too expensive to compute Sp Sg Sp Sg b b Sp∩Sg Sp∩Sg
Approximate Probability Computation • Assumes “b” to be a point, a degenerated case • Exact value is not required • Only 5% incorrect decisions compared to considering many other cases • Surface area heuristics (SAH) [MacDonald and Booth 90, Havran 00] • Equivalent to our approximation
Outline • Probabilistic model • Layout computation • Results
Overview of Layout Algorithm • Cache-oblivious layout computation • Do not assume any particular cache block sizes • Designed to work well with various (geometric) block sizes [Yoon and Lindstrom 06] • Two main steps in recursion • Cluster construction w/ parent-child locality • Layout clusters w/ spatial locality
Clustering • Minimize the working set size during collision queries • Maximize the sum of probabilities of nodes in a cluster • NP-complete even for cache-aware layout given a search query [Gil and Itai 99]
Greedy Clustering • Employ top-down greedy clustering • Compute balanced sized clusters • Maintain convexity [Gil and Itai 99] Cluster 0.5 0.9 0.8 0.1
Layout of Clusters • Uses cache-oblivious layouts of meshes • [Yoon et al. 05] Spatial locality
Layout of Clusters • Uses cache-oblivious layouts of meshes • [Yoon et al. 05] Spatial locality
Outline • Probabilistic model • Layout computation • Results
Results • Collision detection • Use oriented bounding box (OBB) [Gottschalk et al. 96] • Breadth-first tree traversal • Ray tracing • Use kd-tree [Wald 04] • Depth-first tree traversal
Collision Detection – Robot and Power Plant Models 20k triangles 1M triangles
Collision Detection – Performance Comparison I 41% ~ 500% performance improvement Collision time (ms/100) Working set size (KB) van Emde Boas layout Our cache-oblivious layout Breadth-first layout Cache-oblivious mesh layout Depth-first layout Different layouts
Collision Detection – Performance Comparison II 35% ~ 2600% performance improvement Collision time (ms/100) Working set size (KB) van Emde Boas layout Our layout Breadth-first layout Cache-oblivious mesh layout Depth-first layout Different layouts
Cache-Oblivious Layout vs Cache-Aware Layout • Cache-aware layouts • Take advantage of block size information (4KB) • Minor performance degradation • 8% compared to cache-aware layouts
Ray Tracing – Lucy Model 28 million triangles Pentium IV with 1GB
Ray Tracing – Performance Comparison 77% ~ 180% performance improvement Working set size (MB) Render time (sec) Our layout van Emde Boas layout Depth-first layout Breadth-first layout Different layouts
Major Differences over Other Layouts • Commonly used layouts • Consider connectivity of trees • Two improvements of our layouts • Probabilistic model based on geometry • Layout method considering two different localities
Limitations • No guarantee that our layout always improves the performance • May not improve the performance of computationally intensive queries (e.g., exact penetration depth computation) • Assumes that collision algorithm does not use front tracking
Advantages • Generality • Works with any geometric hierarchies • Does not require cache parameters • Usability • Can gain performance improvement without modifying codes • Replaces only data layouts
Conclusion • Cache-efficient layouts of BVHs • Probabilistic model • Simple layout construction method • Applied to collision detection and ray tracing
Ongoing and Future Work • Extend to other proximity and LOD queries [Yoon et al. 06] • Investigate other geometric hierarchies • Improve the quality of hierarchies • Apply to deforming models [Lauterbach et al. 06]
Acknowledgements • Model contributors • Funding agencies • Army Research Office • DARPA • Intel • Lawrence Livermore National Laboratory • Microsoft • National Science Foundation • Office of Naval Research • RDECOM
Acknowledgements • Russ Gayle • Ted Kim • Ming Lin • Peter Lindstrom • Brandon Lloyd • Valerio Pascucci • Stephane Redon • LLNL data analysis group members • Anonymous reviewers
Questions? Thanks!
UCRL-PRES-223220 This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract No. W-7405-ENG-48. Note: this talk is not supported or sanctioned by DoE, UC, LLNL, CASC
Double eagle tanker (82M triangles) Isosurface (472M) St. Matthew (372M) BVHs of Massive Models • Complex and massive models • High memory requirement • Can have gigabyte data size
Speed Size 100 ns 1KB Register 101 ns 1MB Caches 102ns 1GB Main memory 104ns Disk storage > 1GB Memory Hierarchies