1 / 26

Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful

Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful. Jiri Bittner 1 , Michael Wimmer 1 , Harald Piringer 2 , Werner Purgathofer 1 1 Vienna University of Technology 2 VRVis Vienna. R. Q. C. Q. R. R. Q. Q. R. Waiting time. Motivation . Coherent Hierarchical Culling.

Download Presentation

Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Coherent Hierarchical Culling:Hardware Occlusion Queries Made Useful Jiri Bittner1, Michael Wimmer1, Harald Piringer2, Werner Purgathofer1 1Vienna University of Technology 2VRVis Vienna

  2. R Q C Q R R Q Q R Waiting time Motivation • Coherent Hierarchical Culling • Typical hardware occlusion culling scenario CPU R Q GPU R Q time R Render Q Occlusion Query C Cull

  3. Occlusion Culling: Offline vs. Online • Offline • Global information about visibility (from region) - Difficult to implement - Accuracy and maintenance problems + No runtime overhead • Online • Local information about visibility (from point) + Easier to implement + Greater accuracy, easy maintenance - Runtime overhead

  4. Online Occlusion Culling • Object space methods - Need complex geometric calculations(hard to handle detailed scenes) + Do not require rasterization • Image space methods + No geometric calculations(easier to handle detailed scenes) -Require rasterization

  5. Hardware Occlusion Culling • Hardware is good at rasterization! • Hardware counts rasterized fragments • But need not update frame buffer • NV/ARB_occlusion_query • Asynchronous • Allows multiple simultaneous occlusion queries • General algorithm idea: • Render simple approximation first (bbox) • invisible: cull object • visible: render object

  6. Hardware Occlusion Culling • Advantages • Pixel-exact • No explicit occluder rendering • Exploit rasterization power of GPU • Easy to use (API calls) • Problems • Delay in availability of the results • Time to execute queries • If fill-bound: only useful if several objects culled

  7. Hierarchical Stop&Wait (S&W) Front-to-back hierarchy traversal 1. Issue visibility query for node 2. Stop and Wait for result • Invisible: cull the subtree • Visible: render or continue 1. recursively • Advantage: • Hierarchy can cull huge subtrees • Problems: • Waiting causes CPU stalls and GPU starvation • Huge rasterization costs(especially for large interior nodes)

  8. R2 Q3 C3 Q4 R4 R2 Q3 Q4 R4 Waiting time CPU Stalls GPU Starvation and CPU R1 Q2 GPU R1 Q2 time Rx Render object x Qx Query object x Cx Cull object x

  9. Solution: Coherent Hierarchical Culling • Scheduling based on temporal coherence • Skipping certain visibility tests • Immediate rendering of certain geometry • Clever interleaving of queries and rendering • Maintaining a queue of running occlusion queries • Design goal: easy implementation

  10. visible in previous frame Assume independent occlusion R2 Q3 C3 Q4 R4 R2 Q3 Q4 R4 Coherent Hierarchical Culling (CHC) CPU R1 Q2 GPU R1 Q2 time Rx Render object x Qx Query object x Cx Cull object x

  11. CHC Algorithm Outline • Front-to-back hierarchy traversal 1. Node handling • Interior node • Previously invisible: issue visibility query • Previously visible: continue 1. recursively • Leaf • Issue visibility query • Previously visible: render immediately 2. Check availability of query results • Invisible: propagate visibility change • Visible: render or continue 1. recursively

  12. Why Interleaving Works… • Processing a node only depends on… 1. Front to back order 2. Results of queries for processed nodes where:

  13. previously visible previously invisible CHC: Hierarchy Traversal no queries for previously visible interior nodes assume no query dependencies 1 front-to-back order 9 2 11 11 10 10 3 5 5 12 13 4 7 6 6 7 8 8 hidden regions: queries depend on parents 10 7 3 4 9 6 13 11 12 8 5

  14. CHC Features • Reduction of CPU stalls and GPU starvation • Interleaving queries with rendering previously visible geometry • Reduction of the number of queries • Avoids expensive redundant queries for interior nodes • Size of tested regions adapts to visibility • pull-up: occluded region growing • pull-down: visible region growing

  15. Implementation Issues • Front-to-back traversal • Priority queue: allows various hierarchical data structures • Checking query results • glGetOcclusionQueryivNV  GL_PIXEL_COUNT_AVAILABLE_NV • Very cheap operation • Queries for previously visible nodes • Use actual geometry as occludee(instead of bounding box)

  16. Further Optimizations • Conservative visibility testing • Assume visible node remains visible n frames + Saves additional occlusion queries • Approximate visibility • #visible pixels < threshold  node invisible + Saves rendered geometry - Produces image errors

  17. Results – Test Scenes Teapots 11.5M triangles 21k kD-tree nodes City 1M triangles 33k kD-tree nodes Power plant 12.7M triangles 18.7k kD-tree nodes

  18. Results – Speedup Ideal: zero overhead – render only visible geometry

  19. Results – Summary • Comparison to hierarchical S&W • #queries reduced by almost 2 • Times for stalls reduced by 20-60x(to 0.18 –1.31ms) • Close to ideal algorithm! • Only 2–9ms slower • Overhead due to query time

  20. Results – Teapot

  21. Results – City

  22. Results – Powerplant

  23. Optimization Results • Conservative culling, 2 frames assumed visible • Good for deep hierarchies with simple leaf geometry • Further speedup up to 21% • Approximate culling, 25 pixels threshold • Good for scenes with complex visible geometry • Further speedup up to 33%

  24. Conclusion • Efficient scheduling of hardware occlusion queries • Greatly reduces CPU stalls and GPU starvation • Reduces number of required queries • Simple to implement • Arbitrary hierarchical data structure • Speedup ~4 over VFC • Close to ideal solution for tested scenes • Watch out for GPU Gems II

  25. Thanks for Your Attention

  26. CHC: Example previously visible: continue 1. recursively previously visible: render previously visible: issue query + render query result available: continue 1. recursively final classification previously invisible: query query result available: render query result available: mark visible issued queries pull-up invisibility query result available: cull 1 9 2 11 11 10 10 3 5 5 4 6 6 7 7 8 8 query queue GPU Q6/ R6 Q10 R4 Q5 Q6/R6 Q7 Q8 R7 Q10/R10 Q11

More Related