260 likes | 397 Views
Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful. Jiri Bittner 1 , Michael Wimmer 1 , Harald Piringer 2 , Werner Purgathofer 1 1 Vienna University of Technology 2 VRVis Vienna. R. Q. C. Q. R. R. Q. Q. R. Waiting time. Motivation . Coherent Hierarchical Culling.
E N D
Coherent Hierarchical Culling:Hardware Occlusion Queries Made Useful Jiri Bittner1, Michael Wimmer1, Harald Piringer2, Werner Purgathofer1 1Vienna University of Technology 2VRVis Vienna
R Q C Q R R Q Q R Waiting time Motivation • Coherent Hierarchical Culling • Typical hardware occlusion culling scenario CPU R Q GPU R Q time R Render Q Occlusion Query C Cull
Occlusion Culling: Offline vs. Online • Offline • Global information about visibility (from region) - Difficult to implement - Accuracy and maintenance problems + No runtime overhead • Online • Local information about visibility (from point) + Easier to implement + Greater accuracy, easy maintenance - Runtime overhead
Online Occlusion Culling • Object space methods - Need complex geometric calculations(hard to handle detailed scenes) + Do not require rasterization • Image space methods + No geometric calculations(easier to handle detailed scenes) -Require rasterization
Hardware Occlusion Culling • Hardware is good at rasterization! • Hardware counts rasterized fragments • But need not update frame buffer • NV/ARB_occlusion_query • Asynchronous • Allows multiple simultaneous occlusion queries • General algorithm idea: • Render simple approximation first (bbox) • invisible: cull object • visible: render object
Hardware Occlusion Culling • Advantages • Pixel-exact • No explicit occluder rendering • Exploit rasterization power of GPU • Easy to use (API calls) • Problems • Delay in availability of the results • Time to execute queries • If fill-bound: only useful if several objects culled
Hierarchical Stop&Wait (S&W) Front-to-back hierarchy traversal 1. Issue visibility query for node 2. Stop and Wait for result • Invisible: cull the subtree • Visible: render or continue 1. recursively • Advantage: • Hierarchy can cull huge subtrees • Problems: • Waiting causes CPU stalls and GPU starvation • Huge rasterization costs(especially for large interior nodes)
R2 Q3 C3 Q4 R4 R2 Q3 Q4 R4 Waiting time CPU Stalls GPU Starvation and CPU R1 Q2 GPU R1 Q2 time Rx Render object x Qx Query object x Cx Cull object x
Solution: Coherent Hierarchical Culling • Scheduling based on temporal coherence • Skipping certain visibility tests • Immediate rendering of certain geometry • Clever interleaving of queries and rendering • Maintaining a queue of running occlusion queries • Design goal: easy implementation
visible in previous frame Assume independent occlusion R2 Q3 C3 Q4 R4 R2 Q3 Q4 R4 Coherent Hierarchical Culling (CHC) CPU R1 Q2 GPU R1 Q2 time Rx Render object x Qx Query object x Cx Cull object x
CHC Algorithm Outline • Front-to-back hierarchy traversal 1. Node handling • Interior node • Previously invisible: issue visibility query • Previously visible: continue 1. recursively • Leaf • Issue visibility query • Previously visible: render immediately 2. Check availability of query results • Invisible: propagate visibility change • Visible: render or continue 1. recursively
Why Interleaving Works… • Processing a node only depends on… 1. Front to back order 2. Results of queries for processed nodes where:
previously visible previously invisible CHC: Hierarchy Traversal no queries for previously visible interior nodes assume no query dependencies 1 front-to-back order 9 2 11 11 10 10 3 5 5 12 13 4 7 6 6 7 8 8 hidden regions: queries depend on parents 10 7 3 4 9 6 13 11 12 8 5
CHC Features • Reduction of CPU stalls and GPU starvation • Interleaving queries with rendering previously visible geometry • Reduction of the number of queries • Avoids expensive redundant queries for interior nodes • Size of tested regions adapts to visibility • pull-up: occluded region growing • pull-down: visible region growing
Implementation Issues • Front-to-back traversal • Priority queue: allows various hierarchical data structures • Checking query results • glGetOcclusionQueryivNV GL_PIXEL_COUNT_AVAILABLE_NV • Very cheap operation • Queries for previously visible nodes • Use actual geometry as occludee(instead of bounding box)
Further Optimizations • Conservative visibility testing • Assume visible node remains visible n frames + Saves additional occlusion queries • Approximate visibility • #visible pixels < threshold node invisible + Saves rendered geometry - Produces image errors
Results – Test Scenes Teapots 11.5M triangles 21k kD-tree nodes City 1M triangles 33k kD-tree nodes Power plant 12.7M triangles 18.7k kD-tree nodes
Results – Speedup Ideal: zero overhead – render only visible geometry
Results – Summary • Comparison to hierarchical S&W • #queries reduced by almost 2 • Times for stalls reduced by 20-60x(to 0.18 –1.31ms) • Close to ideal algorithm! • Only 2–9ms slower • Overhead due to query time
Optimization Results • Conservative culling, 2 frames assumed visible • Good for deep hierarchies with simple leaf geometry • Further speedup up to 21% • Approximate culling, 25 pixels threshold • Good for scenes with complex visible geometry • Further speedup up to 33%
Conclusion • Efficient scheduling of hardware occlusion queries • Greatly reduces CPU stalls and GPU starvation • Reduces number of required queries • Simple to implement • Arbitrary hierarchical data structure • Speedup ~4 over VFC • Close to ideal solution for tested scenes • Watch out for GPU Gems II
CHC: Example previously visible: continue 1. recursively previously visible: render previously visible: issue query + render query result available: continue 1. recursively final classification previously invisible: query query result available: render query result available: mark visible issued queries pull-up invisibility query result available: cull 1 9 2 11 11 10 10 3 5 5 4 6 6 7 7 8 8 query queue GPU Q6/ R6 Q10 R4 Q5 Q6/R6 Q7 Q8 R7 Q10/R10 Q11