530 likes | 754 Views
Photon Mapping on Programmable Graphics Hardware. Timothy J. Purcell Mike Cammarano Pat Hanrahan Stanford University. Craig Donner Henrik Wann Jensen University of California, San Diego. Motivation. Motivation. Interactive global illumination on the GPU
E N D
Photon Mapping on Programmable Graphics Hardware Timothy J. Purcell Mike Cammarano Pat Hanrahan Stanford University Craig Donner Henrik Wann Jensen University of California, San Diego
Motivation • Interactive global illumination on the GPU • Nearly have sufficient compute power and flexibility • Explore GPU-based computation algorithms
Related Work • CPU-based interactive global illumination • Supercomputers [Parker et al.] • Clusters [Tole et al., Wald et al.] • Global illumination on programmable GPUs • Ray tracing [Carr et al., Purcell et al.] • Photon mapping [Ma et al.] • Radiosity [Carr et al., Coombe et al.] • Translucency [Carr et al., Stamminger et al.]
Photon tracing Emission, scattering, storing into kd-tree Similar to ray tracing Rendering Ray tracing for direct illumination Photon map visualization Indirect bounce Photon Mapping Algorithm Review
Constructing a irregular or sparse data structure Computational Challenge for GPUs #1
Adaptive nearest neighbor search Noise vs. blur Computational Challenge for GPUs #2
Adaptive nearest neighbor search Noise vs. blur Computational Challenge for GPUs #2
Photon Mapping on the CPU • Balanced kd-tree • Compact storage of photons • Efficient • O(log n) search • Priority queue • Nearest neighbor search • Incremental insertion and removal of photons
Algorithmic Changes for the GPU • Direct visualization of photon map • Keeps rendering costs low • Use grid instead of kd-tree • Tried kd-tree… • Kd-tree construction is difficult • Radiance estimate • Fixed radius search works fine • Adaptive search needs priority queue • No priority queue • Can’t build on GPU • Too much state
Contributions • Mapped complete grid-based photon mapping algorithm onto the GPU • Including photon tracing, ray tracing, etc. • Implemented an adaptive k-nearest neighbor search • kNN-grid • Show how to construct a sparse data structure on the GPU • Bitonic merge sort with binary search • Stencil routing
Configuring the GPU for Computing • GPU as data parallel compute engine • Fragment programs execute compute kernels • Screen sized quad initializes computation • SIMD execution • Floating point texture memory • Render-to-texture for intermediate results • Data structure storage • Pointer dereferencing via dependent fetches
Computational Challenge #1 Building a Sparse Data Structure
Building a Sparse Data Structure • Requires scatter • Dependent texture write • Why don’t we have fragment scatter? • Fragment processing has highly coherent blocked memory writes • Extra hardware support would be needed • Write hazards • Memory latencies
Scatter on the GPU • Sort photons into grid cells • Grid cell is sort key • Simulate scatter with fragment programs • Bitonic merge sort followed by binary search • Compact grid • O(log2 n) rendering passes
Bitonic Merge Sort 3 3 3 3 3 2 1 7 7 4 4 4 1 2 4 8 8 7 2 3 3 8 4 7 8 1 4 4 6 2 5 6 6 6 5 2 6 6 5 5 5 6 1 5 2 2 7 7 7 5 1 1 1 8 8 8 O(log2 n) rendering passes
Binary Search • Grid cell searches for self in photon list • If none, find first element in next cell • Empty grid cells waste compute • Log(n) + 1 steps
Binary Search • Grid cell searches for self in photon list • If none, find first element in next cell • Empty grid cells waste compute • Log(n) + 1 steps Searching for first v5 photon Sorted Photon List initialize v0 v0 v0 v2 v2 v2 v5 v5
Binary Search • Grid cell searches for self in photon list • If none, find first element in next cell • Empty grid cells waste compute • Log(n) + 1 steps Searching for first v5 photon Sorted Photon List initialize v0 v0 v0 v2 v2 v2 v5 v5 step 1 v0 v0 v0 v2 v2 v2 v5 v5
Binary Search • Grid cell searches for self in photon list • If none, find first element in next cell • Empty grid cells waste compute • Log(n) + 1 steps Searching for first v5 photon Sorted Photon List initialize v0 v0 v0 v2 v2 v2 v5 v5 step 1 v0 v0 v0 v2 v2 v2 v5 v5 step 2 v0 v0 v0 v2 v2 v2 v5 v5
Binary Search • Grid cell searches for self in photon list • If none, find first element in next cell • Empty grid cells waste compute • Log(n) + 1 steps Searching for first v5 photon Sorted Photon List initialize v0 v0 v0 v2 v2 v2 v5 v5 step 1 v0 v0 v0 v2 v2 v2 v5 v5 step 2 v0 v0 v0 v2 v2 v2 v5 v5 step 3 v0 v0 v0 v2 v2 v2 v5 v5
Binary Search • Grid cell searches for self in photon list • If none, find first element in next cell • Empty grid cells waste compute • Log(n) + 1 steps Searching for first v5 photon Sorted Photon List initialize v0 v0 v0 v2 v2 v2 v5 v5 step 1 v0 v0 v0 v2 v2 v2 v5 v5 step 2 v0 v0 v0 v2 v2 v2 v5 v5 step 3 v0 v0 v0 v2 v2 v2 v5 v5 step 4 v0 v0 v0 v2 v2 v2 v5 v5
Scatter on the GPU • Vertex programs can scatter • Draw point to buffer • Collisions?
Scatter on the GPU • Vertex programs can scatter • Draw point to buffer • Collisions? • Stencil routing • Limit photon count per grid cell • Pre-allocate grid cell space • Draw photons as points • Vertex program computes grid cell • Stencil buffer controls location within cell • Single rendering pass
Fix each grid cell size to n2 pixels Draw fat points to cover each fat cell glPointSize(n) Stencil Routing Vertex ( photon_pos ) Vertex Program 4 pixels Flattened Grid
Control location written to with stencil Pass when stencil is n2 -1 Stencil always increments Location written depends on draw order Stencil Routing Vertex ( photon_pos ) Vertex Program 4 pixels Flattened Grid Stencil Values Stencil 2 3 2 3 1 pixel 0 1 0 1 3 4 2 3 1 2 0 1
Computational Challenge #2 Adaptive Nearest Neighbor Search
Adaptive Nearest Neighbor Search • Iterative algorithm • Accept or reject photons in cell visit order
sample point candidate photon photons in estimate kNN-grid Algorithm Want a 4 photon estimate
Candidate photons must be within max search radius Visit voxels in order of distance to sample point sample point candidate photon photons in estimate kNN-grid Algorithm Want a 4 photon estimate
If current number of photons in estimate is less than number requested, grow search radius sample point candidate photon photons in estimate kNN-grid Algorithm 1 Want a 4 photon estimate
If current number of photons in estimate is less than number requested, grow search radius sample point candidate photon photons in estimate kNN-grid Algorithm 2 Want a 4 photon estimate
Don’t add photons outside maximum search radius Don’t grow search radius when photon is outside maximum radius sample point candidate photon photons in estimate kNN-grid Algorithm 2 Want a 4 photon estimate
Add photons within search radius sample point candidate photon photons in estimate kNN-grid Algorithm 3 Want a 4 photon estimate
Add photons within search radius sample point candidate photon photons in estimate kNN-grid Algorithm 4 Want a 4 photon estimate
Don’t expand search radius if enough photons already found sample point candidate photon photons in estimate kNN-grid Algorithm 4 Want a 4 photon estimate
Add photons within search radius sample point candidate photon photons in estimate kNN-grid Algorithm 5 Want a 4 photon estimate
Visit all other voxels accessible within determined search radius Add photons within search radius sample point candidate photon photons in estimate kNN-grid Algorithm 6 Want a 4 photon estimate
Finds all photons within a sphere centered about sample point May locate more than requested k-nearest neighbors sample point candidate photon photons in estimate kNN-grid Algorithm 6 Want a 4 photon estimate
System Implementation • NVIDIA GeForce FX 5900 Ultra (NV35) • Cg compiler 1.1 Compute Lighting Render Image Trace Photons Build Photon Map Ray Trace Scene Compute Radiance Estimate
Glass Ball – Bitonic Sort 18s @ 512x384, 5K photons
Glass Ball – Stencil Routing 11s @ 512x384, 5K photons
Ring – Bitonic Sort 9s @ 512x384, 16K photons
Ring – Stencil Routing 8s @ 512x384, 16K photons
Cornell Box – Bitonic Sort 64s @ 512x512, 65K photons
Cornell Box – Stencil Routing 47s @ 512x512, 65K photons
Open Issues (1) • How to prevent program execution over a subset of pixels? • Non-uniform pixel computation distribution • Radiance estimate • KILL is only a write mask • Early-z occlusion culling • No pixel level control • Compute mask, branching, or stream buffer? • Improve radiance estimate speed by 30-70% over tiling
Open Issues (2) • Scatter • Makes (a programmer’s) life easier • Is it worth implementing? • Gain factor of log2 n avoiding sort