590 likes | 646 Views
N-Buffers for efficient depth map query. Xavier Décoret Artis GRAVIR/IMAG INRIA. what won’t affect any pixel in final image. Context. Real-time rendering Visibility culling quickly reject what’s not visible. Many methods available [COCSD02,PT02]. Occlusion maps.
E N D
N-Buffers for efficient depth map query Xavier DécoretArtis GRAVIR/IMAG INRIA
what won’t affect any pixel in final image Context • Real-time rendering • Visibility culling • quickly reject what’s not visible • Many methods available [COCSD02,PT02]
Occlusion maps • Select potential occluders [LG95,KCCO00] • project and rasterize them • store distance to closest one at each pixel • Z buffer / occlusion map / depth map • Traverse potential occludees • project and rasterize them • test visibility of each fragment • depth comparison against depth map - use bounding volumes - do it hierarchically
Optimizations • Reduce number of pixels tested • Hierarchical Z Buffer [ZMHH97] • Lazy Occlusion Grid [HTP01] • Summed Area Tables [HW99] • Use hardware Z buffer • implemented for hidden face removal • with optimizations [Mor00, AMN03] • exposed through Occlusion Queries
Occlusion queries • # of pixels passing z test if some geometry were rendered in current framebuffer • Hardware-assisted culling [HSLM02,BWPP04] • Other applications [TPK01] • culling & clamping of shadow volumes [LWGM04] • LOD selection [ASVNB00]
Motivation for N-Buffers • Query depth map within GPU • Advantages • reduce communication with CPU • allow to discard/optimize geometry on GPU • Constraints • limited # of operations • complex datastructures unavailable • no pointers and lists • “complex” algorithms prohibited • branching and indirections costly
Task at hand • For a given object, find the maximum depth covered by its projection • Depth map accessed as a texture • Lookups give information at one pixel • We need information over a region • Use texture to encode depth over a region • proximity grids
The datastructure • Sequence of depth maps (levels) • At level i a texel stores maximum depth in a neighborood of size i • various neigborood/size possible • we choose squares • with lower left corner on texel • with size 2ix 2i
The datastructure • Sequence of depth maps (levels) • At level i a texel stores maximum depth in a neighborood of size i depth map level 0
that texel stores maximum depth within that region The datastructure • Sequence of depth maps (levels) • At level i a texel stores maximum depth in a neighborood of size i depth map level 0 level 1
that texel stores maximum depth within that region The datastructure • Sequence of depth maps (levels) • At level i a texel stores maximum depth in a neighborood of size i depth map level 0 level 1 level 2
that texel stores maximum depth within that region The datastructure • Sequence of depth maps (levels) • At level i a texel stores maximum depth in a neighborood of size i depth map level 0 level 1 level 2 level 3
The datastructure • Like an image pyramid but... • all levels have same resolution • level 0 (depth map) can have any dimensions • not limited to power of 2 • # of levels is log of largest dimension • but we might build only the first levels
level 0 Construction • Level i+1 obtained from level i level 1 level 2
Construction • Level i+1 obtained from level i level 0 level 1 level 2
standard z-buffer Construction • Can be done on the GPU • render scene offscreen • copy depth to texture L[0] • for i = 1 to n • setup fragment program • render a quad • covering viewport • with unit texcoords • with fragment program • copy depth to texture L[i]
Construction • Can be done on the GPU • render scene offscreen • copy depth to texture L[0] • for i = 1 to n • setup fragment program • render a quad • covering viewport • with unit texcoords • with fragment program • copy depth to texture L[i]
Construction • Can be done on the GPU • render scene offscreen • copy depth to texture L[0] • for i = 1 to n • setup fragment program • render a quad • covering viewport • with unit texcoords • with fragment program • copy depth to texture L[i]
Construction • Can be done on the GPU • render scene offscreen • copy depth to texture L[0] • for i = 1 to n • setup fragment program • render a quad • covering viewport • with unit texcoords • with fragment program • copy depth to texture L[i]
Construction • Similar to matrix reduction... • Buck and Purcell, GPU Gems, p 626 • ...but we keep full resolution • gives us locality
Construction • Complexity • first step depends on scene complexity • other steps depends only on resolution • Computation cost • ~10ms for 640x480 • no read back GeForce FX 6800
Query • Naive approach top view viewport level 0 level 1 level 2 level 3 level 4 level 5
Query • Naive approach • project occludee top view viewport level 0 level 1 level 2 level 3 level 4 level 5
Query • Naive approach • project occludee • get screen space bbox • extents + zmin top view viewport level 0 level 1 level 2 level 3 level 4 level 5
25 x 25 Query • Naive approach • project occludee • get screen space bbox • extents + zmin • get bounding neighborood top view viewport level 0 level 1 level 2 level 3 level 4 level 5
25 x 25 zmax Query • Naive approach • project occludee • get screen space bbox • extents + zmin • get bounding neighborood • do one lookup • in matching level • at lower left corner top view viewport level 0 level 1 level 2 level 3 level 4 level 5
25 x 25 Query • Naive approach • project occludee • get screen space bbox • extents + zmin • get bounding neighborood • do one lookup • in matching level • at lower left corner • compare zmin and zmax top view zmax viewport level 0 level 1 level 2 level 3 level 4 level 5
25 x 25 Need a tighter coverage Query • Naive approach • Overly conservative • (bvolume of occludee) • screenspace bbox • bounding neighborood top view viewport level 0 level 1 level 2 level 3 level 4 level 5
zmax z ≤ 4 tiles coverage • depthmax in region > depthmax in sub-region bounding neighborood 25 x 25 24 x 24 screenspace bbox
zmax z ≤ 4 tiles coverage • depthmax in region > depthmax in sub-region bounding neighborood 25 x 25 24 x 24 screenspace bbox
zmax z zmax ≤ = z1, z2, z3, z4 max( ) 4 tiles coverage • depthmax in region > depthmax in sub-region bounding neighborood 25 x 25 24 x 24 screenspace bbox
4 tiles coverage • 5 ways of covering with 4 squares • Measure of the gain on over-conservativity
Applications Occlusion culling Particles Shadow volume clamping
Applications Occlusion culling Particles Shadow volume clamping
Occlusion Culling • N-Buffer vs. Occlusion Queries • walkthrough in city-like scene • occluders at frame n = visible at frame n-1 • Measured the number of depth tests • testing each building • using a hierarchy of bounding volumes
Occlusion Culling • Occlusion queries are faster • harware implementation, available API • N-Buffers penalized • computation of 4 tiles coverage on CPU • use of glReadPixels to query levels • Occlusion queries can be interleavedwith rendering [BWPP04]
Occlusion Culling • # of depth tests smaller with N-Buffers • 4 tests/occludee << nb of pixels rasterized • N-Buffers always benefit from hierarchy • testing A cheaper than testing children(A) • not the case for OQ
n n1 n2 Occlusion Culling • # of depth tests smaller with N-Buffers • 4 tests/occludee << nb of pixels rasterized • N-Buffers always benefit from hierarchy • testing A cheaper than testing children(A) • not the case for OQ n>n1+n2
Hardware implementation? • Extra memory to store levels • Dedicated component for level updates • not all levels? • lazy updates? • Faster than OQ for large objects • Fixed (4) number of operations • simple implementation • good for parallelism
Applications Occlusion culling Particles Shadow volume clamping
Particles • Particle rendered using ARB_point_sprite • no need to compute quad on CPU • Particle animated within GPU • up to a million particle in real-time
Particles • Particle rendered using ARB_point_sprite • no need to compute quad on CPU • Particle animated within GPU • up to a million particle in real-time • How to cull unseen particles? • can not use OQ!
Particles • Using N-Buffers • for 16x16 point sprites • compute 4 first levels only • do one texture lookup in vertex program • Not implementable yet • v. program lookups require LUMINANCE_FLOAT32_ATI • N-Buffers require DEPTH_COMPONENT
Applications Occlusion culling Particles Shadow volume clamping
Shadow volumes clamping • Ignore unseen or fully shadowed casters • Clamp shadow volume to shadowed area [LWGM04]
Shadow volumes clamping • From light’s view, what part of the (visible) scenea shadow volume encompass? light camera scene
Shadow volumes clamping • From light’s view, what part of the (visible) scenea shadow volume encompass? light camera scene
The litmap • Light view of what’s seen by viewer Light’s view Camera’s view
The litmap • Light view of what’s seen by viewer Light’s view Camera’s view
Shadow volumes clamping • From light’s view, what part of the (visible) scenea shadow volume encompass? • Minimum/maximum depth coveredby a shadow caster