290 likes | 487 Views
A Hierarchical Shadow Volume Algorithm. Timo Aila 1,2 Tomas Akenine-Möller 3. 1 Helsinki University of Technology 2 Hybrid Graphics 3 Lund University. Outline. Brief intro to shadow volumes fillrate problem, existing solutions Our solution idea implementation Results Q&A.
E N D
A Hierarchical Shadow Volume Algorithm Timo Aila1,2 Tomas Akenine-Möller3 1Helsinki University of Technology 2Hybrid Graphics 3Lund University
Outline • Brief intro to shadow volumes • fillrate problem, existing solutions • Our solution • idea • implementation • Results • Q&A
Shadow volumes [Crow77] • Shadow volumes define closed volumes of space that are in shadow infinitesimallight source shadow caster = light cap dark cap extrudedside quads
Is point inside shadow volume? • Pick reference point R outside shadow volume • any such point is OK • Span line from R to point to be classified • Compute sum of enter (+1) and exit (-1) events P1 shadow volume R 2D illustration: P2 P3
Using graphics hardware • R at ∞behind pixel (z-fail) [Bilodeau&Songy, Carmack] • infinity always outside SVs – robust • must not clip to far plane of view frustum • sum hidden events to stencil buffer,sign from backface culling visible samples (or pixels) 2D illustration: + - camera R + - + view frustum shadow volume
Amount of pixel processing Adapted from [Chan and Durand 2004]
Fillrate problem • 50+ fps without shadows on ATI Radeon 9800XT at 1280x1024, 1 sample/pixel • 1 fps when shadow volumes rasterized • 2.2 billion pixels per frame
Existing solutions (1/2) • CC shadow volumes [Lloyd et al. 2004] • draw SVs only where receivers exist • good when lots of empty space • Hybrid shadow maps and volumes [Chan&Durand 2004] • use SVs only at shadow boundaries • boundary pixels determined using shadow map • artifacts due to limited shadow map resolution
min max Existing solutions (2/2) • Depth bounds [Nvidia 2003] • application supplies min & max depth values separately for each shadow volume • rasterize shadow volume only when visible geometry between [min,max] • optimal bounds hard to compute camera 2D illustration: shadow volume visible pixels
Outline • Brief intro to shadow volumes • fillrate problem, existing solutions • Our solution • idea • implementation • Results • Q&A
Green tiles may contain shadow boundary - other tiles were correct
How to detect shadow boundaries? • Two facts about shadow volumes • always closed • SV triangles mark potential shadow boundaries • If 3D volume in scene not intersected by shadow volume triangles • fully lit or fully in shadow • single sample classifies entire volume
Outline • Brief intro to shadow volumes • fillrate problem, existing solutions • Our solution • idea • implementation • Results • Q&A
Zmax 8 Zmin 8 pixels Detecting boundary tiles • Bound tile with axis-aligned bounding box • 8x8 pixel region • Zmin, Zmax • Triangle vs. AA Box intersection test • low-resolution rasterization • Zmin and Zmax tests
Fast update of non-boundary tiles • Copylow-res shadows to stencil buffer • writing 64 per-pixel values would be slow • Two-level stencil buffer saves the day • maintain [Smin, Smax] per tile • always test the higher level first • often no need to validate per-pixel values • stencil values of non-boundary tiles are constant
Implementation – Stage 1 SV triangles Low-res shadows • Buffers built separately for each shadow volume • Classifications ready when entire SV processed • application marks begin/end of shadow volumes Boundary? Low-resolution rasterizer Per-tile operations
Low-res shadows Implementation – Stage 2 Boundary? SV triangles Low-resolution rasterizer No Copy to2-level stencil boundary tile? Yes Per-pixel rasterizer Stencil ops Update 2-level stencil
Alternative implementations • Two pass • Pass 1 = Stage 1 • Pass 2 = Stage 2 • How to keep pixel units busy during Stage 1? • maybe assign per-tile operations to pixel shaders? • Single pass • Separate stages using delay stream [Aila et al. 2003] • Stage 2 of current SV executes simultaneously with next SV’s Stage 1
Hardware resources • Two-level stencil buffer • Per-tile operations • Optionally • delay stream * • duplicate low-res rasterizer & Zmin/Zmax units * • cache for per shadow volume buffers • multiple buffers for pipelined operation • allocate from external memory * If not already there for occlusion culling purposes
Outline • Brief intro to shadow volumes • fillrate problem, existing solutions • Our solution • idea • implementation • Results • Q&A
Summary • Hierarchical rendering method for shadow volumes • significant fillrate savings compared to other hardware methods • also works for soft shadow volumes • Future work • would it make sense to extend programmability to per-tile operations? • how many pipeline bubbles are created? • requires chip-level simulations
Thank you! • Questions? • Acknowledgements • Ville Miettinen, Jacob Ström, Eric Haines, Ulf Assarsson, Lauri Savioja, Jonas Svensson, Ulf Borgenstam, Karl Schultz, 3DR group at Helsinki University of Technology • The National Technology Agency of Finland, Hybrid Graphics, Bitboys, Nokia and Remedy Entertainment • ATI for granting fellowship to Timo (2004-2005)