1 / 28

Granular Visibility Queries on the GPU

Granular Visibility Queries on the GPU. Thomas Engelhardt & Carsten Dachsbacher Visualization Research Center University of Stuttgart. Motivation: Culling. Motivation: Culling. Remove rendering workload from the pipeline Prevent draw calls from execution Frustum Culling

Download Presentation

Granular Visibility Queries on the GPU

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Granular VisibilityQueries on the GPU Thomas Engelhardt & Carsten Dachsbacher Visualization Research Center University of Stuttgart

  2. Motivation: Culling

  3. Motivation: Culling • Remove rendering workload from the pipeline • Prevent draw calls from execution • Frustum Culling • Hardware Occlusion Queries (HOQ) • Occlusion Predicates • Prevent shaders from execution • Backface Culling • Early-Z

  4. Motivation: Culling • Control of shader execution based on visibility • Geometry Shader • Pixel Shader when early-z is disabled • Visibility not only per object / draw call but per • Primitive / primitive cluster • Screen space region • Evaluate and use visibility on GPU, no application feedback

  5. Image Space Visibility • How to determine image space visibility? • Take some objects • Rasterize • Count pixels that passed the depth test But how to count? 6 8 11

  6. Contribution • Two output sensitive pixel counting methods for from point visibility • Pixel Counting Summed Area Tables (PiC-SAT) • Hierarchical Item Buffer (HIB) • Can also be done with HOQs. Why not use them? • Granularity limitation & synchronization • Application to • Culling of individual instances • Control of GS and PS execution for per pixel displacement mapping

  7. Pixel CountingusingSummed Area Tables

  8. Pixel Countingusing SATs • SAT stores sum of pixel values • Pixel sum of any rectangular region with just 4 lookups • Screen space bounding box 0 0 0 0 0 1 1 0 0 3 1 1 2 0 0 0 0 0 2 1 0 0 1 6 4 1 0 1 1 1 1 1 4 1 1 1 7 1 9 1 1 1 3 4 0 1 5 1 6 9 4 1 4 12 14 S=1 + 17 – 1 – 6 = 11 1 4 1 7 1 9 1 11 1 14 6 1 6 17 19 1 4 8 1 1 11 1 14 17 6 6 20 22 1 4 1 9 1 13 1 17 20 6 6 23 25 1 4 9 13 17 20 6 6 25 23

  9. Pixel Countingusing SATs • Crucial: Query regions must not overlap! • Can‘t differentiate to which object the pixels in the overlap belong

  10. Conflict Objects • Conflict Objects • Objects whose bounding rectangles overlap • How to resolve conflict? • Distribute objects among color channels without overlap • 4 parallel SATs per RGBA texture What is the distribution strategy?

  11. Graph ColoringAlgorithms • Graph Coloring Algorithms • Assign colors to vertices in a graph • Vertices connected by an edge must not share the same color • Difficult problem • Requires heuristic approaches like Chaitin‘s algorithm • What if more edges than colors available? correctcoloring falsecoloring

  12. Object Distribution by Graph Coloring • Construct a conflict graph • Each object‘s bounding rectangle one vertex • Each overlap one edge Graph Construction OVERLAP How to color the graph?

  13. Chaitin‘sAlgorithm • Heuristic algorithm desgined for register allocation • Input • Conflict Graph • Set of colors • Output • Color coded graph • Some vertices may remain uncolored • Complexity: O(N²) color 2 color 1

  14. Chaitin‘sAlgorithm: Deconstruction • Find anyvertexwith least numberofincidentedges • Remove vertexandputonto a stack • Repeat untilgraphdeconstructed 2 1 3 5 5 2 4 1 3 4 stack

  15. Chaitin‘sAlgorithm: Reconstruction • Reinsert top vertex on stackintograph • Find a color not usedbyanyreconsructedneighbor • Repeat untilentiregraphisreconstructed Nocoloravailable color 1 color 2 2 1 3 5 5 2 4 1 3 4 stack What to do about uncolored objects?

  16. AboutUncolored Objects • Uncolored objects need additional treatment • Split bounding rectangle of uncolored object • Attempt to color sub rectangles • Assign any color if no unique color can be found • Visibility overestimation • Attempt to merge sub regions

  17. The Pixel Counting SAT Pipeline Objects CPU (Application) GPU ConstructConflict Graph Graph Coloring Render totextureandcompute SAT [Hensley05] Count Pixelsby SAT Look Up Treat Uncolored Objects Calculate Look UpCoordinates colorinformation ShaderConstants lookupinformation [Hensley05: Fast Summed Area Table Generation anditsApplications]

  18. Pixel CountingusingtheHierarchical Item Buffer

  19. The Hierarchical Item Buffer (HIB) • Exploitshistogramcomputationalgorithm • GPU implementationdemonstratedby Scheuermann [Scheuermann07: Efficienthistogramgenerationusingscattering on GPUs] Render unique IDs totexture 169 168 167 177 179 178 21 188 187 30 31 32 115 116 8 125 42 124 41 126 134 136 135 144 146 145 10

  20. GPU Item Buffer • Reinterpret ID textureaspointlist • Vertex orGeometryShaderforscattering • Blending operationsforcounting 21 30 … 31 32 … 124 115 125 188 179 … 167 VS/GS Mapstohistogram bin Rasterizer Renderspoint primitive Blending Increments bin 0 1 0 1 0 0 0 1 … 1 0 0 1 1 0 1 0 0 0 histogram/item buffer

  21. HierarchicalQueries • Intelligentlydistributing IDs enableshierarchicalqueriesbymipmapping MipMap 0 1 … 1 0 0 1 … 1 0 0 1 … 1 1 3 1 1 1 3 2 4 2 2 4 1 1 6 11 8

  22. ApplicationsandResults

  23. CullingofInstances • Shadow volumes with instanced rendering • Volumes entirely contained in others have no effect [Lloyd04: CC Shadow Volumes] • Test caster visibility from light • HOQ / Occl. Predicates cannot be applied directly • Granularity: a single draw call, not individual instances Instances of the same object Cull volume with no contribution Shadow Volumes

  24. CullingofInstances • Granularity: Per Instance (Sub-ID: Instance ID) • 500 shadow casters (606 triangles each) • ID texture/SAT resolution: 512x512 pixels

  25. Cullingof Individual Primitives • Displacement Mapping • Setup costs in GS (mesh extrusion, tetrahedra, texture gradients) • Ray-Casting in PS • Cannot exploit early-z due to depth write in PS • Don‘t output triangles if extruded prism is not visible • Exact visibility requires ray-casting in HIB/SAT pass • Conservative visibility estimation by mesh extrusion

  26. Cullingof Individual Primitives • Granularity: Per prism • Lizard: 7132 triangles • ID texture / SAT resolution: 512x512 NVIDIA: GTX280 ATI: HD3780

  27. Discussion Pixel Counting SAT • Not enough colors, if many objects • Visibility overestimation • Difficult implementation • Not entirely transparent to application (overlap, coloring, …) • Performance • Dominated by treatment of uncolored objects (rectangle split/merge, texture access) • Can handle arbitrary screen regions for query Hierarchical Item Buffer • No penalty for many objects • Easy Implementation • Transparent to application, GPU handles everything • Performance • Dominated by overdraw in item buffer caused by choice of IDs. • Usage of many IDs better exploits parallelism. Mip map does the rest (memory consumption) • Query regions defined by ID assignment

  28. Questions?

More Related