290 likes | 713 Views
Stencil Routed A-Buffer. Kevin Myers and Louis Bavoil NVIDIA. Our Cool Thing. What is it?. A-Buffer Simply a list of fragments per-pixel “The A-buffer, an antialiased hidden surface method” [Carpenter 84] Related Work Depth Peeling [Mammen 89] [Everitt 01] k-Buffer [Bavoil et al. 07].
E N D
Stencil Routed A-Buffer Kevin Myers and Louis Bavoil NVIDIA
What is it? • A-Buffer • Simply a list of fragments per-pixel • “The A-buffer, an antialiased hidden surface method” [Carpenter 84] • Related Work • Depth Peeling [Mammen 89] [Everitt 01] • k-Buffer [Bavoil et al. 07]
Why do I need this? • Often want more than nearest • Alpha blending • Volume rendering • Collision detection • Refraction and caustics • Global illumination
Why is it hard? • GPU’s optimized to capture nearest layer • Z buffering and early z test • Fine for most real-time lighting models • Wasteful if not rendering front to back
Things that don’t work • Blending can’t just turn of z-buffering • Most operations non-commutative • MRT • Can’t direct output • Reading what you’re writing • Hazardous • “Multi-Layer Depth Peeling via Fragment Sort” [Liu et al. 06] • k-Buffer [Bavoil et al. 07]
A-Buffer • “A list of fragments per-pixel” • Anything on the GPU that resembles this? • MSAA • “A list of samples per-pixel” • Samples store coverage
MSAA in review • Multisampled Antialiasing • Fragments are rasterized at a higher res • 8xMSAA == 8 x aliased resolution • Pixel shader is run once per-pixel • Frame buffer storage is at sample resolution
Say What? • MSAA samples == A-Buffer pixels?? • MSAA sample patterns don’t help • Need all MSAA samples at pixel center
Line up your Sub-samples • Turn off multisampling • Still render to an MSAA buffer • Pixel shader output bloats to all sub-samples • BOOL D3D10_RASTERIZER_DESC::MultisampleEnable • Now writing 8 samples per pixel • All have the same value!!
Bloating Your Pixel • Applause? • Meets the definition • “List of fragments per-pixel” • Not exactly what we want • Each item contains same value • Next fragment will clobber the entire list • Need to update one entry in the list • Once and only once
Stencil Routing Stencil always increments Stencil passes when 4
Stencil Routing • First introduced by Purcell et al 2003 • Did not work for general rasterization • Tile aligned points • Fat point is spread across four pixels • Four pixels get same value • Stencil allows one pixel to update
Stencil Routing and MSAA • Stencil always operates at sample res • Regardless of MultisampleEnable state • DX10 Spec • Use sub-samples to route • Allows any pixel shader output to be routed • Arbitrary primitives
A Stencil Test That Works • StencilFunc • D3D10_COMPARISON_EQUAL • StencilRef • 2 • More on this later • StencilPassOp and StencilFailOp • D3D10_STENCIL_OP_DECR_SAT
Initializing Stencil • Clear stencil buffer to pass value ( 2 ) • Initializes sample 0 to 2 • Use SampleMask to selectively update • Stencil set to replace with refrence value
Why start at 2? • When all sub-samples are written • Most stencil values will be 0 • Except the last one written • Last sample written stencil == 1 • When overflow occurs • All stencil values will be 0
Occlusion Query Test Pixel did not overflow Pixel overflowed
Handling Overflow • Set sample mask to last sample updated • Draw full screen quad • Issue an occlusion query • Set stencil to pass if stencil == 0 • Check occlusion query • Sample pass count == overflow count
Handling Overflow • Occlusion query • Good • Very fast • Allows for dynamic A-Buffer sizing • Bad • Requires some CPU intervention • Ideally A-Buffer size is fixed
Demo Time! Demo
Secrets of the Dragon • Single A-Buffer • RG32F • R is packed color • G is depth • Saves on texture loads • Post process sort • 8 fragment per-pixel bitonic sort • Additional fragments, insertion sort
8800 GTX Performance Alpha Blended Stanford Dragon
Limits…DOH! • 254 layers of depth max • 8-bit stencil ( 255 – 1 for overflow bit ) • If you do this call us cause that’s crazy • Fragments at same depth • Must be handled in post-process • MSAA
Summary • Stencil Routed A-Buffer • Ideally suited for complex geometries • Much faster than depth peeling • A-buffer can be dynamically resized • Use an occlusion query • Best to pre-determine size
Future Work • Render target arrays • Each target has its own stencil buffer • Target replaces sub-sample • Or augments sub-sample • #arrays * MSAA level in one “CPU pass” • With dx10 saturates 254 layers • Use instancing for additional “GPU passes”
Thanks for all the fish • Claudio Silva, Steven Callahan, Joao Comba, Aaron Lefohn, Cass Everitt, Peach Myers
The last slide… • ? • kmyers@nvidia.com • lbavoil@nvidia.com