280 likes | 418 Views
Granular Visibility Queries on the GPU. Thomas Engelhardt & Carsten Dachsbacher Visualization Research Center University of Stuttgart. Motivation: Culling. Motivation: Culling. Remove rendering workload from the pipeline Prevent draw calls from execution Frustum Culling
E N D
Granular VisibilityQueries on the GPU Thomas Engelhardt & Carsten Dachsbacher Visualization Research Center University of Stuttgart
Motivation: Culling
Motivation: Culling • Remove rendering workload from the pipeline • Prevent draw calls from execution • Frustum Culling • Hardware Occlusion Queries (HOQ) • Occlusion Predicates • Prevent shaders from execution • Backface Culling • Early-Z
Motivation: Culling • Control of shader execution based on visibility • Geometry Shader • Pixel Shader when early-z is disabled • Visibility not only per object / draw call but per • Primitive / primitive cluster • Screen space region • Evaluate and use visibility on GPU, no application feedback
Image Space Visibility • How to determine image space visibility? • Take some objects • Rasterize • Count pixels that passed the depth test But how to count? 6 8 11
Contribution • Two output sensitive pixel counting methods for from point visibility • Pixel Counting Summed Area Tables (PiC-SAT) • Hierarchical Item Buffer (HIB) • Can also be done with HOQs. Why not use them? • Granularity limitation & synchronization • Application to • Culling of individual instances • Control of GS and PS execution for per pixel displacement mapping
Pixel Countingusing SATs • SAT stores sum of pixel values • Pixel sum of any rectangular region with just 4 lookups • Screen space bounding box 0 0 0 0 0 1 1 0 0 3 1 1 2 0 0 0 0 0 2 1 0 0 1 6 4 1 0 1 1 1 1 1 4 1 1 1 7 1 9 1 1 1 3 4 0 1 5 1 6 9 4 1 4 12 14 S=1 + 17 – 1 – 6 = 11 1 4 1 7 1 9 1 11 1 14 6 1 6 17 19 1 4 8 1 1 11 1 14 17 6 6 20 22 1 4 1 9 1 13 1 17 20 6 6 23 25 1 4 9 13 17 20 6 6 25 23
Pixel Countingusing SATs • Crucial: Query regions must not overlap! • Can‘t differentiate to which object the pixels in the overlap belong
Conflict Objects • Conflict Objects • Objects whose bounding rectangles overlap • How to resolve conflict? • Distribute objects among color channels without overlap • 4 parallel SATs per RGBA texture What is the distribution strategy?
Graph ColoringAlgorithms • Graph Coloring Algorithms • Assign colors to vertices in a graph • Vertices connected by an edge must not share the same color • Difficult problem • Requires heuristic approaches like Chaitin‘s algorithm • What if more edges than colors available? correctcoloring falsecoloring
Object Distribution by Graph Coloring • Construct a conflict graph • Each object‘s bounding rectangle one vertex • Each overlap one edge Graph Construction OVERLAP How to color the graph?
Chaitin‘sAlgorithm • Heuristic algorithm desgined for register allocation • Input • Conflict Graph • Set of colors • Output • Color coded graph • Some vertices may remain uncolored • Complexity: O(N²) color 2 color 1
Chaitin‘sAlgorithm: Deconstruction • Find anyvertexwith least numberofincidentedges • Remove vertexandputonto a stack • Repeat untilgraphdeconstructed 2 1 3 5 5 2 4 1 3 4 stack
Chaitin‘sAlgorithm: Reconstruction • Reinsert top vertex on stackintograph • Find a color not usedbyanyreconsructedneighbor • Repeat untilentiregraphisreconstructed Nocoloravailable color 1 color 2 2 1 3 5 5 2 4 1 3 4 stack What to do about uncolored objects?
AboutUncolored Objects • Uncolored objects need additional treatment • Split bounding rectangle of uncolored object • Attempt to color sub rectangles • Assign any color if no unique color can be found • Visibility overestimation • Attempt to merge sub regions
The Pixel Counting SAT Pipeline Objects CPU (Application) GPU ConstructConflict Graph Graph Coloring Render totextureandcompute SAT [Hensley05] Count Pixelsby SAT Look Up Treat Uncolored Objects Calculate Look UpCoordinates colorinformation ShaderConstants lookupinformation [Hensley05: Fast Summed Area Table Generation anditsApplications]
The Hierarchical Item Buffer (HIB) • Exploitshistogramcomputationalgorithm • GPU implementationdemonstratedby Scheuermann [Scheuermann07: Efficienthistogramgenerationusingscattering on GPUs] Render unique IDs totexture 169 168 167 177 179 178 21 188 187 30 31 32 115 116 8 125 42 124 41 126 134 136 135 144 146 145 10
GPU Item Buffer • Reinterpret ID textureaspointlist • Vertex orGeometryShaderforscattering • Blending operationsforcounting 21 30 … 31 32 … 124 115 125 188 179 … 167 VS/GS Mapstohistogram bin Rasterizer Renderspoint primitive Blending Increments bin 0 1 0 1 0 0 0 1 … 1 0 0 1 1 0 1 0 0 0 histogram/item buffer
HierarchicalQueries • Intelligentlydistributing IDs enableshierarchicalqueriesbymipmapping MipMap 0 1 … 1 0 0 1 … 1 0 0 1 … 1 1 3 1 1 1 3 2 4 2 2 4 1 1 6 11 8
CullingofInstances • Shadow volumes with instanced rendering • Volumes entirely contained in others have no effect [Lloyd04: CC Shadow Volumes] • Test caster visibility from light • HOQ / Occl. Predicates cannot be applied directly • Granularity: a single draw call, not individual instances Instances of the same object Cull volume with no contribution Shadow Volumes
CullingofInstances • Granularity: Per Instance (Sub-ID: Instance ID) • 500 shadow casters (606 triangles each) • ID texture/SAT resolution: 512x512 pixels
Cullingof Individual Primitives • Displacement Mapping • Setup costs in GS (mesh extrusion, tetrahedra, texture gradients) • Ray-Casting in PS • Cannot exploit early-z due to depth write in PS • Don‘t output triangles if extruded prism is not visible • Exact visibility requires ray-casting in HIB/SAT pass • Conservative visibility estimation by mesh extrusion
Cullingof Individual Primitives • Granularity: Per prism • Lizard: 7132 triangles • ID texture / SAT resolution: 512x512 NVIDIA: GTX280 ATI: HD3780
Discussion Pixel Counting SAT • Not enough colors, if many objects • Visibility overestimation • Difficult implementation • Not entirely transparent to application (overlap, coloring, …) • Performance • Dominated by treatment of uncolored objects (rectangle split/merge, texture access) • Can handle arbitrary screen regions for query Hierarchical Item Buffer • No penalty for many objects • Easy Implementation • Transparent to application, GPU handles everything • Performance • Dominated by overdraw in item buffer caused by choice of IDs. • Usage of many IDs better exploits parallelism. Mip map does the rest (memory consumption) • Query regions defined by ID assignment