GPU-Assisted Path Tracing

GPU-Assisted Path Tracing Matthias Boindl Christian Machacek Institute of Computer Graphics and Algorithms Vienna University of Technology

Motivation: Why Path Tracing? • Physically based • Nature provides the reference image • Parallelizable • Sublinear in #objects • Conceptually simple • Can lead to a clean implementation • But: fast implementation on GPUs not trivial

Outline • Path tracing intro • Main steps of the algorithm • Mapping the algorithm to the GPU • How to organize code into kernels • When to launch kernels • How to pass data between kernels • Accelerationstructures • Focus on bounding volume hierarchies Christian Machacek

Path Tracing Intro • Like ray tracing, except it… • …supports arbitrary BRDFs • …is stochastic: at each bounce, the new direction is decided randomly • Convergence video From Pharr, Humphreys: PBRT, 2nd ed. (2010)

Path Tracing Pseudocode while image not converged r = new ray from eye through next pixel do i = closest intersection of r with scene if no i: break if i is on a light source: c = c + throughput * emission randomly pick new direction and create reflected ray r evaluate BRDF at i update throughput while path throughput high enough From Pharr, Humphreys: PBRT, 2nd ed. (2010)

Megakernel Execution Divergence From Bikker (2013)

Solution: Wavefront Path Tracing • Separate, specialized kernels • Keep a pool of ~1 million paths alive • Work for next stage goes into kernel-specific, compact queues (=4MB index arrays) https://mediatech.aalto.fi/~samuli/

Results • Performance • Execution times • (ms / 1M path segments) Christian Machacek

Limitations and Possible Improvements • Higher memory requirements (+200 MB) • Kernel launch overhead • Dynamic parallelism on GK110 • Use an outer scheduling kernel • No CPU round trip • Launch independent stages side-by-side • CUDA streams • So kernels with little work don’t hog the GPU Christian Machacek

Acceleration Structures • Find nearestintersection in O(log N) • Space partitioning vs. objectpartitioning • Hybrid methodsexist Matthias Boindl

Performance • For interactive rendering, compromise • Traversal performance (build quality) • Construction/Update time • Update or rebuild from scratch • Adapt to GPU environment • Memory architecture • Parallel execution Matthias Boindl

State of the Art • TeroKarras and Timo Aila. 2013. Fast parallel constructionofhigh-qualityboundingvolumehierarchies. In Proceedingsofthe 5th High-Performance Graphics Conference (HPG '13). ACM, New York, NY, USA, 89-99. Matthias Boindl

Close the Performance Gap Matthias Boindl

Basic Idea • Fast construction of simple BVH • Generate leaf for each triangle • Reduce SAH cost by modifying tree Matthias Boindl

Treelets • Allow local tree modification ABCF areleaves, DEG areinternalnodes Matthias Boindl

Treelet Construction • Find root: parallel bottom-up traversal • Start withleaves • Useatomiccounteratconjunctions • Ensures all childrenhavebeenprocessed • Buildtreelet • Add bothchildren • Pick childrenwithhighestsurfacearea • Fixed size: 7 leafnodes Matthias Boindl

Rearrange Treelet • Minimizetreeletrootnodesurfacearea • Naive implementation: testeachpermutation • Better: dynamicprogramming • Caching ofbest intermediate resultsStart withleaves, thenpairs, thentriplets, … • Suboptimal subtreeconstructionavoided • Parallelizableas well Matthias Boindl

Results • Gap closed Matthias Boindl

Results • Speed/Quality tradeoff Matthias Boindl

Conclusion • Use specialized kernels • Lower execution divergence • (Better use of instruction cache) • (Fewer registers used simultaneously) • Constructaccelerationstructuresquickly • But not tooquickly Matthias Boindl

Thanks for your attention! Institute of Computer Graphics and Algorithms Vienna University of Technology

Results • Speed/Quality tradeoff Matthias Boindl

Logic Kernel • Does not need a queue, operates on all paths • If shadow ray was unblocked, add light contribution • Find material or light source the ray hits • Place path into proper material queue • Russian roulette • If path terminated, accumulate to image • Place path into new path queue • Sample light sources (aka next event estim.) Christian Machacek

New Path Kernel • Generate a new image-space sample • Generate camera ray • Place it into extension ray cast queue • Initialize path state • Throughput • Pixel position • etc. Christian Machacek

Material Kernels • Generate incoming direction • Evaluate light contribution based on light sample generated in the logic kernel • We haven’t cast the shadow ray yet! • For MIS: p(light sample) from the BSDF • Discard BSDF stack • Queue • extension ray • (shadow ray) Christian Machacek

Ray Cast Kernels • Extension rays • Find first intersection against scene geometry • Store hit data into path state • Shadow rays • Blocked or not? Christian Machacek

GPU-Assisted Path Tracing