Interactive Distributed Ray Tracing of Highly Complex Models

Interactive Distributed Ray Tracing of Highly Complex Models Ingo Wald University of Saarbrücken http://graphics.cs.uni-sb.de/~wald http://graphics.cs.uni-sb.de/rtrt

Reference Model (12.5 million tris)

Power Plant- Detail Views

Previous Work • Interactive Rendering of Massive Models (UNC) • Framework of algorithms • Textured-depth-meshes (96% reduction in #tris) • View-Frustum Culling & LOD (50% each) • Hierarchical occlusion maps (10%) • Extensive preprocessing required • Entire model: ~3 weeks (estimated) • Framerate (Onyx): 5 to 15 fps • Needs shared-memory supercomputer

Previous Work II • Memory Coherent RT, Pharr (Stanford) • Explicit cache management for rays and geometry • Extensive reordering and scheduling • Too slow for interactive rendering • Provides global illumination • Parallel Ray-Tracing, Parker et al. (Utah) &Muus (ARL) • Needs shared-memory supercomputer • Interactive Rendering with Coherent Ray Tracing (Saarbrücken, EG 2001) • IRT on (cheap) PC systems • Avoiding CPU stalls is crucial

Previous Work: Lessons Learned… • Rasterization possible for massive models …… but not ‘straightforward’ (UNC) • Interactive Ray Tracing is possible (Utah,Saarbrücken) • Easy to parallelize • Cost is only logarithmic in scene size • Conclusion: Parallel, Interactive Ray Tracing should work great for Massive Models

Parallel IRT • Parallel Interactive Ray Tracing • Supercomputer: more threads… • PCs: Distributed IRT on CoW • Distributed CoW: Need fast access to scene data • Simplistic access to scene data • mmap+Caching, all done automatically by OS • Either: Replicate scene • Extremely inflexible • Or: Access to single copy of scene over NFS (mmap) • Network issues: Latencies/Bandwidth

Simplistic Approach Caching via OS support won’t work: • OS can’t even address more than 2Gb of data… • Massive Models >> 2Gb ! • Also an issue when replicating the scene… • Process stalls due to demand paging • stalls very expensive ! • Dual-1GHz-PIII: 1 ms stall = 1 million cycles = about 1000 rays ! • OS automatically stalls process  reordering impossible…

Distributed Scene Access • Simplistic approach doesn’t work… • Need ‘manual’ caching and memory management

Caching Scene Data • 2-Level Hierarchy of BSP-Trees • Caching based on self-contained “voxels“ • Clients need only top-level bsp (few kb) • Straightforward implementation…

BSP-Tree: Structure and Caching Grain

Caching Scene Data • Preprocessing: Splitting Into Voxels • Simple spatial sorting (bsp-tree construction) • Out-of-core algorithm due to model size • Filesize-limit and address space (2GB) • Simplistic implementation: 2.5 hours • Model Server • One machine serves entire model Single server = Potential bottleneck ! • Could easily be distributed

Hiding CPU Stalls • Caching alone does not prevent stalls ! • Avoiding Stalls  Reordering • Suspend rays that would stall on missing data • Fetch missing data asynchronously ! • Immediately continue with other ray • Potentially no CPU stall at all ! • Resume stalled rays after data is available • Can only hide ‘some’ latency  Minimize voxel-fetching latencies

Reducing Latencies • Reduce Network Latencies • Prefetching ? • Hard to predict data accesses several ms is advance ! • Latency is dominated by transmission time (100Mbit/s  1MB = 80ms = 160 million cycles !!!) • Reduce transmitted data volume

Reducing Bandwidth • Compression of Voxel Data • LZO-library provides for 3:1 compression • If compared to original transmission time, decompression cost is negligible ! • Dual-CPU system: Sharing of Voxel Cache • Amortize bandwidth, storage and decompression effort over both CPUs… Even better for more CPUs

Load Balancing • Load Balancing • Demand driven distribution of tiles (32x32) • Buffering of work tiles on the client • Avoid communication latency • Frame-to-Frame Coherence Improves Caching • Keep rays on the same client • Simple: Keep tiles on the same client (implemented) • Better: Assign tiles based on reprojected pixels (future)

Results • Setup • Seven dual Pentium-III 800-866 MHzas rendering clients • 100 Mbit FastEthernet • One display & model server (same machine) • GigabitEthernet (already necessary for pixels data) • Powerplant Performance • 3-6 fps in pure C implementation • 6-12 fps with SSE support

Animation:Framerate vs. Bandwidth  Latency-hiding works !

Scalability Server bottleneck after 12 CPUs  Distribute model server!

Performance: Detail Views Framerate (640x480) : 3.9 - 4.7 fps (seven dual P-III 800-866 Mhz CPUs, NO SSE)

Shadows and Reflections Framerate: 1.4-2.2 fps (NO SSE)

Demo

Conclusions • IRT works great for highly complex models ! • Distribution issues can be solved • At least as fast as sophisticated HW-techniques • Less preprocessing • Cheap • Simple & easy to extend (shadows, reflections, shading,…)

Future Work • Smaller cache granularity • Distributed scene server • Cache-coherent load balancing • Dynamic scenes & instances • Hardware support for ray-tracing

Acknowledgments • Anselmo Lastra, UNC • Power plant reference model … other complex models are welcome…

Questions ? For further information visit http://graphics.cs.uni-sb.de/rtrt

Four Power Plants (50 million tris)

Detailed View of Power Plant Framerate: 4.7 fps (seven dual P-III 800-866 Mhz CPUs, NO SSE)

Detail View: Furnace Framerate: 3.9 fps, NO SSE

Overview • Reference Model • Previous Work • Distribution Issues • Massive Model Issues • Images & Demo

Interactive Distributed Ray Tracing of Highly Complex Models

Interactive Distributed Ray Tracing of Highly Complex Models

Presentation Transcript

Ray Tracing

Distributed Ray Tracing Part 1

Distributed Ray Tracing Part 2

Ray Tracing

Distributed Ray Tracing

Interactive Ray Tracing of Point-based Models

Afrigraph Tutorial B: Interactive Ray-Tracing

Distributed Ray Tracing

Distributed Ray Tracing

Interactive Rendering With Coherent Ray Tracing

Using Interactive Ray Tracing for Interactive Global Illumination

Hybrid Ray Tracing of Massive Models

Distributed Ray Tracing

Ray Tracing

DISTRIBUTED RAY TRACING

Interactive Rendering With Coherent Ray Tracing

Distributed Ray Tracing