1 / 29

Stencil Routed A-Buffer

Stencil Routed A-Buffer. Kevin Myers and Louis Bavoil NVIDIA. Our Cool Thing. What is it?. A-Buffer Simply a list of fragments per-pixel “The A-buffer, an antialiased hidden surface method” [Carpenter 84] Related Work Depth Peeling [Mammen 89] [Everitt 01] k-Buffer [Bavoil et al. 07].

addison
Download Presentation

Stencil Routed A-Buffer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stencil Routed A-Buffer Kevin Myers and Louis Bavoil NVIDIA

  2. Our Cool Thing

  3. What is it? • A-Buffer • Simply a list of fragments per-pixel • “The A-buffer, an antialiased hidden surface method” [Carpenter 84] • Related Work • Depth Peeling [Mammen 89] [Everitt 01] • k-Buffer [Bavoil et al. 07]

  4. Why do I need this? • Often want more than nearest • Alpha blending • Volume rendering • Collision detection • Refraction and caustics • Global illumination

  5. Why is it hard? • GPU’s optimized to capture nearest layer • Z buffering and early z test • Fine for most real-time lighting models • Wasteful if not rendering front to back

  6. Things that don’t work • Blending can’t just turn of z-buffering • Most operations non-commutative • MRT • Can’t direct output • Reading what you’re writing • Hazardous • “Multi-Layer Depth Peeling via Fragment Sort” [Liu et al. 06] • k-Buffer [Bavoil et al. 07]

  7. A-Buffer • “A list of fragments per-pixel” • Anything on the GPU that resembles this? • MSAA • “A list of samples per-pixel” • Samples store coverage

  8. MSAA in review • Multisampled Antialiasing • Fragments are rasterized at a higher res • 8xMSAA == 8 x aliased resolution • Pixel shader is run once per-pixel • Frame buffer storage is at sample resolution

  9. Say What? • MSAA samples == A-Buffer pixels?? • MSAA sample patterns don’t help • Need all MSAA samples at pixel center

  10. Line up your Sub-samples • Turn off multisampling • Still render to an MSAA buffer • Pixel shader output bloats to all sub-samples • BOOL D3D10_RASTERIZER_DESC::MultisampleEnable • Now writing 8 samples per pixel • All have the same value!!

  11. Bloating Your Pixel • Applause? • Meets the definition • “List of fragments per-pixel” • Not exactly what we want • Each item contains same value • Next fragment will clobber the entire list • Need to update one entry in the list • Once and only once

  12. Stencil Routing Stencil always increments Stencil passes when 4

  13. Stencil Routing • First introduced by Purcell et al 2003 • Did not work for general rasterization • Tile aligned points • Fat point is spread across four pixels • Four pixels get same value • Stencil allows one pixel to update

  14. Stencil Routing and MSAA • Stencil always operates at sample res • Regardless of MultisampleEnable state • DX10 Spec • Use sub-samples to route • Allows any pixel shader output to be routed • Arbitrary primitives

  15. Stencil Routing and MSAA

  16. A Stencil Test That Works • StencilFunc • D3D10_COMPARISON_EQUAL • StencilRef • 2 • More on this later • StencilPassOp and StencilFailOp • D3D10_STENCIL_OP_DECR_SAT

  17. Initializing Stencil • Clear stencil buffer to pass value ( 2 ) • Initializes sample 0 to 2 • Use SampleMask to selectively update • Stencil set to replace with refrence value

  18. Why start at 2? • When all sub-samples are written • Most stencil values will be 0 • Except the last one written • Last sample written stencil == 1 • When overflow occurs • All stencil values will be 0

  19. Occlusion Query Test Pixel did not overflow Pixel overflowed

  20. Handling Overflow • Set sample mask to last sample updated • Draw full screen quad • Issue an occlusion query • Set stencil to pass if stencil == 0 • Check occlusion query • Sample pass count == overflow count

  21. Handling Overflow • Occlusion query • Good • Very fast • Allows for dynamic A-Buffer sizing • Bad • Requires some CPU intervention • Ideally A-Buffer size is fixed

  22. Demo Time! Demo

  23. Secrets of the Dragon • Single A-Buffer • RG32F • R is packed color • G is depth • Saves on texture loads • Post process sort • 8 fragment per-pixel bitonic sort • Additional fragments, insertion sort

  24. 8800 GTX Performance Alpha Blended Stanford Dragon

  25. Limits…DOH! • 254 layers of depth max • 8-bit stencil ( 255 – 1 for overflow bit ) • If you do this call us cause that’s crazy • Fragments at same depth • Must be handled in post-process • MSAA

  26. Summary • Stencil Routed A-Buffer • Ideally suited for complex geometries • Much faster than depth peeling • A-buffer can be dynamically resized • Use an occlusion query • Best to pre-determine size

  27. Future Work • Render target arrays • Each target has its own stencil buffer • Target replaces sub-sample • Or augments sub-sample • #arrays * MSAA level in one “CPU pass” • With dx10 saturates 254 layers • Use instancing for additional “GPU passes”

  28. Thanks for all the fish • Claudio Silva, Steven Callahan, Joao Comba, Aaron Lefohn, Cass Everitt, Peach Myers

  29. The last slide… • ? • kmyers@nvidia.com • lbavoil@nvidia.com

More Related