280 likes | 436 Views
GPU-Assisted Path Tracing. Matthias Boindl Christian Machacek. Institute of Computer Graphics and Algorithms Vienna University of Technology. Motivation: Why Path Tracing?. Physically based Nature provides the reference image Parallelizable Sublinear in #objects Conceptually simple
E N D
GPU-Assisted Path Tracing Matthias Boindl Christian Machacek Institute of Computer Graphics and Algorithms Vienna University of Technology
Motivation: Why Path Tracing? • Physically based • Nature provides the reference image • Parallelizable • Sublinear in #objects • Conceptually simple • Can lead to a clean implementation • But: fast implementation on GPUs not trivial
Outline • Path tracing intro • Main steps of the algorithm • Mapping the algorithm to the GPU • How to organize code into kernels • When to launch kernels • How to pass data between kernels • Accelerationstructures • Focus on bounding volume hierarchies Christian Machacek
Path Tracing Intro • Like ray tracing, except it… • …supports arbitrary BRDFs • …is stochastic: at each bounce, the new direction is decided randomly • Convergence video From Pharr, Humphreys: PBRT, 2nd ed. (2010)
Path Tracing Pseudocode while image not converged r = new ray from eye through next pixel do i = closest intersection of r with scene if no i: break if i is on a light source: c = c + throughput * emission randomly pick new direction and create reflected ray r evaluate BRDF at i update throughput while path throughput high enough From Pharr, Humphreys: PBRT, 2nd ed. (2010)
Path Tracing Pseudocode while image not converged r = new ray from eye through next pixel do i = closest intersection of r with scene if no i: break if i is on a light source: c = c + throughput * emission randomly pick new direction and create reflected ray r evaluate BRDF at i update throughput while path throughput high enough From Pharr, Humphreys: PBRT, 2nd ed. (2010)
Megakernel Execution Divergence From Bikker (2013)
Solution: Wavefront Path Tracing • Separate, specialized kernels • Keep a pool of ~1 million paths alive • Work for next stage goes into kernel-specific, compact queues (=4MB index arrays) https://mediatech.aalto.fi/~samuli/
Results • Performance • Execution times • (ms / 1M path segments) Christian Machacek
Limitations and Possible Improvements • Higher memory requirements (+200 MB) • Kernel launch overhead • Dynamic parallelism on GK110 • Use an outer scheduling kernel • No CPU round trip • Launch independent stages side-by-side • CUDA streams • So kernels with little work don’t hog the GPU Christian Machacek
Acceleration Structures • Find nearestintersection in O(log N) • Space partitioning vs. objectpartitioning • Hybrid methodsexist Matthias Boindl
Performance • For interactive rendering, compromise • Traversal performance (build quality) • Construction/Update time • Update or rebuild from scratch • Adapt to GPU environment • Memory architecture • Parallel execution Matthias Boindl
State of the Art • TeroKarras and Timo Aila. 2013. Fast parallel constructionofhigh-qualityboundingvolumehierarchies. In Proceedingsofthe 5th High-Performance Graphics Conference (HPG '13). ACM, New York, NY, USA, 89-99. Matthias Boindl
Close the Performance Gap Matthias Boindl
Basic Idea • Fast construction of simple BVH • Generate leaf for each triangle • Reduce SAH cost by modifying tree Matthias Boindl
Treelets • Allow local tree modification ABCF areleaves, DEG areinternalnodes Matthias Boindl
Treelet Construction • Find root: parallel bottom-up traversal • Start withleaves • Useatomiccounteratconjunctions • Ensures all childrenhavebeenprocessed • Buildtreelet • Add bothchildren • Pick childrenwithhighestsurfacearea • Fixed size: 7 leafnodes Matthias Boindl
Rearrange Treelet • Minimizetreeletrootnodesurfacearea • Naive implementation: testeachpermutation • Better: dynamicprogramming • Caching ofbest intermediate resultsStart withleaves, thenpairs, thentriplets, … • Suboptimal subtreeconstructionavoided • Parallelizableas well Matthias Boindl
Results • Gap closed Matthias Boindl
Results • Speed/Quality tradeoff Matthias Boindl
Conclusion • Use specialized kernels • Lower execution divergence • (Better use of instruction cache) • (Fewer registers used simultaneously) • Constructaccelerationstructuresquickly • But not tooquickly Matthias Boindl
Thanks for your attention! Institute of Computer Graphics and Algorithms Vienna University of Technology
Results • Speed/Quality tradeoff Matthias Boindl
Logic Kernel • Does not need a queue, operates on all paths • If shadow ray was unblocked, add light contribution • Find material or light source the ray hits • Place path into proper material queue • Russian roulette • If path terminated, accumulate to image • Place path into new path queue • Sample light sources (aka next event estim.) Christian Machacek
New Path Kernel • Generate a new image-space sample • Generate camera ray • Place it into extension ray cast queue • Initialize path state • Throughput • Pixel position • etc. Christian Machacek
Material Kernels • Generate incoming direction • Evaluate light contribution based on light sample generated in the logic kernel • We haven’t cast the shadow ray yet! • For MIS: p(light sample) from the BSDF • Discard BSDF stack • Queue • extension ray • (shadow ray) Christian Machacek
Ray Cast Kernels • Extension rays • Find first intersection against scene geometry • Store hit data into path state • Shadow rays • Blocked or not? Christian Machacek