870 likes | 1.06k Views
Afrigraph Tutorial B: Interactive Ray-Tracing. Ingo Wald Philipp Slusallek Saarland University Computer Graphics Group http://graphics.cs.uni-sb.de. For almost 20 years, researchers have argued that eventually, Ray-Tracing will become faster than rasterization.
E N D
Afrigraph Tutorial B:Interactive Ray-Tracing Ingo Wald Philipp Slusallek Saarland University Computer Graphics Group http://graphics.cs.uni-sb.de
For almost 20 years, researchers have argued that eventually, Ray-Tracing will become faster than rasterization Tutorial on Interactive Raytracing
For almost 20 years, researchers have argued that eventually, Ray-Tracing will become faster than rasterization And nothing happened... Well, almost ... Tutorial on Interactive Raytracing
UNC Powerplant (12.5 Mtris, >10 fps) Tutorial on Interactive Raytracing
Four Power Plants (50 Mtris) Tutorial on Interactive Raytracing
Tutorial Overview • Introduction • Introduction to Ray-Tracing • Discussion: Ray-Tracing versus Rasterization • Previous Work • Approximating Ray-Tracing • Accelerated Ray-Tracing • Interactive Ray-Tracing on PCs • Coherent Ray-Tracing Implementation • Comparisons (SW / HW) • Distributed RT of Massive Models • Outlook: Hardware-Architectures for Ray-Tracing • Future Research and Conclusions Tutorial on Interactive Raytracing
Tutorial Overview • Introduction • Introduction to Ray-Tracing • Discussion: Ray-Tracing versus Rasterization • Previous Work • Approximating Ray-Tracing • Accelerated Ray-Tracing • Interactive Ray-Tracing on PCs • Coherent Ray-Tracing Implementation • Comparisons (SW / HW) • Distributed RT of Massive Models • Outlook: Hardware-Architectures for Ray-Tracing • Future Research and Conclusions Tutorial on Interactive Raytracing
Introduction to Ray-Tracing • In principle: Very simple algorithm • For each pixel • Create ray through that pixel • Cast ray into scene and find closest intersection • “Shade” ray at intersection point • Can also shoot new rays during shading: • Determine visibility of point lights by “shadow rays” • Compute reflected/refracted light by recursively tracing reflection-/refraction-rays • Basically, that´s all… Tutorial on Interactive Raytracing
Ray-Tracing Algorithm Tutorial on Interactive Raytracing
Introduction to Ray-Tracing • Only three main components: • Generating rays • Finding the closest intersection of a ray • Ray traversal • Ray-object intersection • Shading Tutorial on Interactive Raytracing
Ray-Generation • Generate initial ray for each pixel • Other camera models are trivial… • Fisheye lens • Non-linear distortions/Lens effects • Motion blur, depth of field • … • Options • More samples for anti-aliasing • Adaptive Sampling • Combine with IBR • E.g. „RenderCache”: Reuse samples by reprojection Tutorial on Interactive Raytracing
Ray-Traversal Grid (2D) • Need to find objectsquickly • “Exhaustive” searchinfeasible • Build spatial index structure • Grid, octree, BSP-tree, BVH, ... • Advantages • Logarithmic complexity • Occlusion culling • “Early ray termination” • Problems • Multiple intersection computations (objects often in multiple voxels) • Dynamic scenes ? Octree (2D) Tutorial on Interactive Raytracing
Ray-Object-Intersection • Need to compute intersectionsfast • Requires many floating point operations • But typically dominated by traversal (2:1) • Plenty of algorithms • Plenty of primitives • Even for triangles • Optimizations • Use SIMD CPU-extensions (SSE, AltiVec, 3D-Now) • Data parallel execution • Proper caching of data Tutorial on Interactive Raytracing
Shading • Lots of reflection models possible • Phong, Cook-Torrance, Ward, … • Direct use of Shading Languages (Renderman) • Shading after visibility has been computed • No overhead due to overdraw • Every ray is shaded exactly once • Can generate new rays • Shadow, reflection, transmission, ... • Need to deal with recursion • Rendering cost linear in #rays traced Tutorial on Interactive Raytracing
Introduction to Ray-Tracing • Only three main components: • Generating rays • Finding the closest intersection of a ray • Ray traversal • Ray-object intersection • Shading • Problem: • “Find closest intersection” is very expensive • And: Lots of rays per image … Tutorial on Interactive Raytracing
Rasterization Pipeline Application In Contrast: Rasterization • Efficient HW implementation • Use of object coherence • Many new features • Rendering is driven by App. • Application submits geometry • Visibility determined at end • Z-buffer fragment test T&L, Vertex Ops Rasterization Texturing Fragment Ops Fragment Tests Framebuffer Tutorial on Interactive Raytracing
RasterizationDrawbacks Drawbacks of this approach • Use of object coherence • Only if triangle is large • Rendering is driven by App. • Application has to know what is visible… • Efficient occlusion culling is hard • Visibility determined at end • Overdraw: Discard all but one fragments • High depth complexity: very inefficient Tutorial on Interactive Raytracing
Ray-Tracing versus Rasterization • Flexibility • Handling unstructured groups of rays • Image-based rendering, reflections, shadows … • Generality • Ray-Tracing is the basis for many algorithms • Global illumination, visibility, … • Used in many disciplines • Physics, Biology, Chemistry, Telecom, … Tutorial on Interactive Raytracing
Ray-Tracing versus Rasterization • Simple and Efficient Shading • Shading happens after visibility computation • Direct use of Shading Languages • Correctness & Image Quality • Rasterization inherently relies on approximations • Environment maps, shadow maps, ... • Ray-traced images are “correct” by default • ´True´ reflections and shadows… • Use of approximations is optional Tutorial on Interactive Raytracing
Ray-Tracing versus Rasterization • Parallel Scalability • Ray-Tracing is „embarrassingly parallel“ (e.g. each pixel independent of all others) • Scales well with the available hardware • Needs fast access to scene data base Tutorial on Interactive Raytracing
Ray-Tracing versus Rasterization • Scalability with Scene Size: Occlusion Culling & Logarithmic Complexity • RT never even looks at invisible geometry • RT traversal allows for efficient searching: O(log N) • Rasterization shows linear behavior: O(N) RT wins for complex scenes • But rasterization is improving Tutorial on Interactive Raytracing
Ray-Tracing versus Rasterization • Coherence • Key to efficient rendering • Rasterization: Object coherence • Allows for efficient HW implementation • But only really efficient for large triangles • Ray-Tracing: Ray coherence • Improved caching & reduced bandwidth • Allows for data parallel computation • RT has much more coherence than assumed • But harder to exploit… Tutorial on Interactive Raytracing
Ray-Tracing versus Rasterization • Conclusion of that Comparison • Ray Tracing has many advantages • These advantages become ever more pronounced • Not only qualty, also efficiency… • But: Ray-Tracing is (still) costly • Have to make it faster ! Tutorial on Interactive Raytracing
Tutorial Overview • Introduction • Introduction to Ray-Tracing • Discussion: Ray-Tracing versus Rasterization • Previous Work • Approximating Ray-Tracing • Accelerated Ray-Tracing • Interactive Ray-Tracing on PCs • Coherent Ray-Tracing Implementation • Comparisons (SW / HW) • Distributed RT of Massive Models • Outlook: Hardware-Architectures for Ray-Tracing • Future Research and Conclusions Tutorial on Interactive Raytracing
Previous and Related Work Two ways to achieve ray-tracing like quality interactively: • Trace less rays per frame: “Approximative ray-tracing” • Rasterization hardware • Image-based techniques • Interpolation of ray-traced results • Trace more rays/sec: “Accelerated ray-tracing” • Better data structures • Better algorithms • Better implementations • Parallel processing Tutorial on Interactive Raytracing
Previous and Related Work Two ways to achieve ray-tracing like quality interactively: • Trace less rays per frame: “Approximative ray-tracing” • Rasterization hardware • Image-based techniques • Interpolation of ray-traced results • Trace more rays/sec: “Accelerated ray-tracing” • Better data structures • Better algorithms • Better implementations • Parallel processing Tutorial on Interactive Raytracing
Approximated Ray-Tracing:Rasterization Hardware • „HW-Accelerated“ vista/shadow buffers • Compute visible geometry in HW • Lookup of geometry in frame buffer • Only works for primary rays and point lights • Creates artifacts (e.g. shadow buffer resolution) • Augmenting hardware with RT effects • Selective ray-tracing • Integrate ray-tracing with OpenGL rendering • Rasterization for diffuse objects • Textures or splatting [Stamminger/Haber 00/01] for ray-traced samples Tutorial on Interactive Raytracing
Approximated Ray-Tracing:Corrective Textures Tutorial on Interactive Raytracing
Approximated Ray-Tracing:Image-Based Techniques • RenderCache [Walter et al. 99] • Store ray samples per pixel (color, depth, ...) • Reproject samples for next frame • Detect and fill holes by sending few new rays • Heuristic algorithms based on neighborhood • Locate and correct errors (shadow, etc) • Pseudo-randomly sample a few other pixel • Adaptively sample near error regions • But: Reprojection and Heuristics are expensive • Pays off (only) when pixels are very expensive to compute directly (e.g. global illumination) • Scales badly with #CPUs Tutorial on Interactive Raytracing
Approximated Ray-Tracing:Image-Based Techniques • Holodeck [Ward 98] • Similar to RenderCache, but • Long term storage of ray samples on disk • Fast access to samples based on grid structure • Builds light-field-like data representation Tutorial on Interactive Raytracing
Approximated Ray-Tracing:Image-Based Techniques • Interpolation in the image plane • Pixel-selected ray-tracing [Akimoto, 89] • Coarse sampling grid • Adaptive refinement based on error criteria • Linear interpolation between samples • General ray interpolation [Bala, 99] • Object-/Ray-/Image-Space • Time • Error bounded Tutorial on Interactive Raytracing
Previous and Related Work Two ways to achieve ray-tracing like quality interactively: • Trace less rays per frame: “Approximative ray-tracing” • Rasterization hardware • Image-based techniques • Interpolation of ray-traced results • Trace more rays/sec: “Accelerated ray-tracing” • Better data structures • Better algorithms • Better implementations • Parallel processing Tutorial on Interactive Raytracing
Accelerated Ray Tracing:Better Data Structures/Algorithms • ´Best´ data structure (Grid vs BSP vs…) ? • Always scene and implementation dependent • In practice, most do about equally well… • Well-reserached topic ´New´ data structures are unlikely to be found • But: Potential for better algorithms: • Can we better exploit coherence ? • Can we build data structures faster ? • Can we build data structures fully automatically ? • Also: Need for dynamic data structures Tutorial on Interactive Raytracing
Accelerated Ray-Tracing:Parallelization on SuperComputers • RT of large CSG models [Muuss 95] • Motivation: Interactively render complex data sets • Idea: Use raytracing • Flexibility: Avoid tessellation of CSG-models • Take advantage of logarithmic complexity of RT • Exploit parallelism • Implementation • Optimized, general RT algorithm • 96 CPU, SGI PowerChallenge, shared memory • Results • 1-2 frames per second @ video resolution (in ´95!!!) Tutorial on Interactive Raytracing
Accelerated Ray-Tracing:Parallelization on SuperComputers • Utah Parallel RT System [Parker 99] • Similar approach to Muuss • Parallelization on shared memory machine • Supports general primitives and volume data sets • Results • Has shown scalability up to 128 CPUs • Importance of cachinganalysis • New goal: interactive visual cues for visualization(Same information at less cost) Tutorial on Interactive Raytracing
Tutorial Overview • Introduction • Introduction to Ray-Tracing • Discussion: Ray-Tracing versus Rasterization • Previous Work • Approximating Ray-Tracing • Accelerated Ray-Tracing • Interactive Ray-Tracing on PCs • Coherent Ray-Tracing Implementation • Comparisons (SW / HW) • Distributed RT of Massive Models • Outlook: Hardware-Architectures for Ray-Tracing • Future Research and Conclusions Tutorial on Interactive Raytracing
IRT on PC´s:What to keep in mind • PC hardware has changed dramatically • Processors become much faster • But increase in ray-tracing speed is gradual • Increasing gap between speed of CPU and memory • But ray-tracing algorithm did not change • SIMD extensions • Flops become increasingly cheap • But difficult to take advantage of in ray-tracing • Fast (and cheap) networking & network of PCs • But good performance on non-shared-memory is hard • Small clusters are around everywhere… Tutorial on Interactive Raytracing
IRT on PC´s:What to keep in mind • PC hardware has changed dramatically • Have to adapt our algorithms ! • Special emphasis on • Keeping the CPU busy • Memory & Caching(1 cache miss can cost several triangle intersections) • SIMD • Not so important any more: • Instruction count, avoiding float ops Tutorial on Interactive Raytracing
General Optimizations: Cache Main memory is too slow for CPU (1:10) (bandwidth and latency) • Keep relevant data in caches • Design algorithms for cache reuse coherence • Align data to cache lines (32 bytes) • Separate data according to usage • Separate volatile from non-volatile data • Store intersection data separate from shading data(e.g. shading normals not needed for intersection) • Prefetch data • Design algorithms to enable data access prediction Tutorial on Interactive Raytracing
General Optimizations: Cache Cache Reuse Example: Triangle Data Structure • Variant 1: Struct Triangle { Vec3f *a,*b,*c; }; • Intersect() routine works on this structure • Prefetching hard (2 levels of indirection) • Data stored in 4 different memory regions (1 struct + 3 vectors) • Worst case: 8 cache misses (if each of the 4 data overlaps cacheline border) Tutorial on Interactive Raytracing
General Optimizations: Cache Cache Reuse Example: Triangle Data Structure • Variant 2: With preprocessed intersection data • All necessary data packed into 48 aligned bytes(see paper) • Con: Additional data to store (48b/triangle) • But several advantages: • At most 2 cache misses • 1 continuous memory region Trivial to prefetch Tutorial on Interactive Raytracing
General Optimizations: Cache • This was only one example: Similarly for • BSP Nodes (even more important) • Triangle lists • Materials • Shading Data • … Tutorial on Interactive Raytracing
General Optimizations: Simplification Today's CPUs have very long pipelines • Simplify the code to avoid pipeline stalls • Choose simple algorithms • “KISS” wins…(KISS = keep it simple and stupid) • E.g. BSP-tree traversal simpler than grids • Easier to maintain and optimize (e.g. prefetching) • Write tight inner loops • E.g. better caching and handling of branches • Avoid conditionals/relative jumps in inner loops • E.g. support only triangles • Avoid memory-access stalls Caching, caching, caching !!! Tutorial on Interactive Raytracing
Optimization:SIMD Extensions Most CPUs provide SIMD extensions Intel: SSE (Others: 3D-Now!, AltiVec, ...) • Use SIMD: higher speed & lower bandwidth • Up to four parallel floating point operations For the cost of 1 ! • Fetch data once to reduce bandwidth to cache • Amortize loading cost over 4 operations Factor 4 in bandwidth reduction • Overhead due to restricted instruction set • E.g. no ´SSE dot product´ • Con: Programming in assembly language Tutorial on Interactive Raytracing
Optimization:SIMD Extensions How to use SIMD Extensions ? • Either: Instruction-parallel • Combine 4 computations in ´normal´ algorithm • E.g. the 4 mults in a dot product • Or: Data-parallel • Run algorithm on 4 different data in parallel • E.g. 4 independent dot products Tutorial on Interactive Raytracing
SIMD: Intersection • SIMD best used in data parallel fashion • Little instruction-level parallelism (in RT) Just doesn´t work… • Data parallel: 1 ray 4 triangles • Hard to always have four triangles ready • Data parallel traversal for 1 ray ? • Data parallel: 4 rays 1 triangle • Must traverse rays in parallel ray packets • Standard intersection code • Overhead for terminated rays(E.g. 1 ray hits, 3 rays miss) Tutorial on Interactive Raytracing
SIMD: Intersection • Performance Results • Comparison against already optimized C code • Amortized cost for SSE code 20-36 million intersections/sec! (P-III, 800 MHz) Tutorial on Interactive Raytracing
SIMD: BSP-Traversal • Recursive Traversal Algorithm Tutorial on Interactive Raytracing
SIMD: BSP-Traversal • SIMD-Traversal • Traverse four rays in parallel • Intersection with split plane & traversal decision • Combine decisions flags • All rays must perform the same traversal • Make sure order is consistent • Easy to guarantee: Same ray origin or same signs of direction vector • Avoid recursion function calls • Maintain stack manually • Worst case: as bad as before… Tutorial on Interactive Raytracing
SIMD: BSP-Traversal • Overhead of SIMD-Traversal (in %) • Fixed resolution at 10242 (l), fixed 2x2 packet (r) • Traversal still dominates rendering cost • Overall speedup factor: 2 to 2.3 Tutorial on Interactive Raytracing