470 likes | 591 Views
An Interactive Out-of-Core Framework for Visualizing Massively Complex Models. Ingo Wald MPI Informatik Andreas Dietrich, Philipp Slusallek Saarland University. Outline. Motivation Rendering complex models Our Challenge: The „Boeing 777“ model Our Approach
E N D
An Interactive Out-of-Core Framework for VisualizingMassively Complex Models Ingo Wald MPI Informatik Andreas Dietrich, Philipp Slusallek Saarland University
Outline • Motivation • Rendering complex models • Our Challenge: The „Boeing 777“ model • Our Approach • Out-of-core ray tracing for massive models • Memory management scheme • Proxy mechanism for representing not-yet-loaded data • Results • Conclusion and Future Work
Motivation – Are there „complex models“ any more ? • Today: Steeply rising graphics performance • Faster GPUs (100+ million tris/s) • „Moore‘s Law“ : Performance doubles every 1.5 years… • But: Model complexity rising (at least) as fast • Higher performance spent as soon as available • Best example: Games... • CAD&VR used for ever larger engineering projects • Collaboration of more and more designers • Each of which models „his part“ at full accuracy… • Immensely complex models
Complex Models: Previous Work • Brute force rasterization • Use fastest available graphics hardware • PC GFX cards today: ~100MTri/s • Even at theoretical peak performance several sec. per frame • Usually try to reduce #triangles to be rendered • Mesh simplification • Edge collapse, vertex removal, remeshing, etc. • Often requires „useful“ input meshes
Complex Models: Previous Work • Occlusion Culling • Visibility preprocessing (from region, from point, etc) • Hierarchical Z-Buffer • Can only helf if there is enough occlusion • System solutions (MMR, GigaWalk, iWalk) • Build on combination of ideas • Visibility precomputation + occlusion culling + LODs + … • Problem: Individual techniques already problematic • Complex precomputation and data structures • Often suffers from artefacts (popping etc)
Complex Models: Previous Work • QSplat • Hierarchical point-sampled representation • Best for locally smooth meshes • Problematic for high depth complexity • Randomized Z-Buffer • Randomly selects subset of triangles to be rendered • Best for almost-random data (tree leaves etc.) • Several Others • Impostors, Textured depth meshes, …
Today‘s Challenge: The „Boeing 777“ • 350 million triangles • 12 GB just for vertex positions, 35-70GB incl BSPs • Complex geometrical structure • Unstructured „soup“ of triangles • Often self-intersecting, coplanar, and badly shaped • Complex interwoven parts like pipes, cables, … • Low degree of occlusion • Goal: Render interactively on single PC • Dual-Opteron 246 (1.8GHz) w/ 6GB RAM (or less)
Today‘s Challenge: The „Boeing 777“ Same complexity all over the model…
Problems in the 777 • Unorganized „soup“ of triangles
Problems in the 777 • Complex, interwoven geometry • Problematic for simplification-style algorithms • High depth complexity
Problems in the 777 • Low degree of occlusion • Visibility Preprocessing / Occlusion Culling ? • Even perfect occlusion culling generates millions of tris…
Complex Models: Previous Work • Conclusion • Most previous approaches problematic for 777-style models • Note: Same problem with most real-world CAD models • Need another approach…
Complex Models: Previous Work • Conclusion • Most previous approaches problematic for 777-style models • Note: Same problem with most real-world CAD models • Need another approach… • Idea: Ray Tracing logarithmic in #triangles • Ideal for complex models
Basic Idea: Ray Tracing Ideal for Complex Models… • Proof by example: Sunflowers model … • 1 billion triangles (3x the 777) • Interactive performance on OpenRT engine [Wald03] • Even including shadows, transparency, textures, etc… • Are 350 million still a problem ?
Basic Idea: Ray Tracing Ideal for Complex Models… • Caveat • Sunflowers uses instantiation easily fits into <1GB RAM • 777: individual triangles 35-70GB data • First test: On SUN SunFire 12k w/ 96 GB RAM • Not a problem – it just works… • On desktop PC: • Typically 2 to (at most) 8 GB RAM • Need out-of-core (OOC) mechanism
OOC Ray Tracing • Pharr 1997: Memory Coherent Ray Tracing • Manual caching of scene geometry • Extensive reordering of rays to minimize disk I/O • But: Only for offline rendering • Wald 2001: Interactive OOC Ray Tracing • Same idea as MCRT, but interactive • Caching on „chunks“ of ~1500 triangles • Minimal reordering: Only to hide loading latency • Assumed that all missing data can be loaded every frame • Only few cache misses tolerable
OOC Ray Tracing • Loading all missing data every frame ? • Lots of data required even for small camera movement • Loading all missing data in same frame not possible • Must tolerate that some rays must be cancelled(due to lack of data) Need to cancel „faulting“ rays • Need to detect which scene access will stall • OOC Memory management • Need to find replacement color for cancelled ray • Geometry Proxies
OOC Memory Management • Lessons learned from [Wald01] • Streaming precomputation good • Object-based caching on 1500-triangle-blocks not good • Extensive replication when generating 1500tri-blocks • Fragmentation (both internal and external) • Bad cache granularity • Memory management and data handling quite costly • Better: Use tile-based caching à la Linux • Build large BSP on disk (streaming preprocess) • „mmap“ into 64bit-address space • Let CPU and OS do I/O and address translation • But: Need to avoid page faults
OOC Memory Management • Tile table: Stores which tiles are in memory • Organized as hash-table for efficient access • Requires only few kilobytes • Each lookup costs only few bit-operations and compares • On „cache“ miss • Cancel faulting ray before access avoid OS page fault • Put tile ID into request queue • Page in tile asynchronously in „tile fetcher thread“ • If memory is fully used • Asynchronously evict tiles using „second chance“ • Control what is paged in and out at what time • Avoid any stalls of the rendering threads
OOC Memory Management • So far • Can efficiently detect and avoid page faults • And asynchronously load missing data • Render at full accuracy once all data is available • Performance • 2-3 fps @ 1280x1024 • Single PC
OOC Memory Management • So far • Can efficiently detect and avoid page faults • And asynchronously load missing data • Render at full accuracy once all data is available • Performance • 2-3 fps @ 1280x1024 • Single PC • Question: What to dowith cancelled rays ? (marked red here)
Cancelling rays • Our approach: Shade ray using „proxies“ • Build proxy for each subtree addressed by pointer crossing tile boundaries • Precomputation: Sample subtree‘s volume with rays • Record shading information (normal and material properties) • Store information for several discretized directions • Similar to LightField • For faulting ray during rendering: Fetch corresponding proxy • Interpolate shading information from closest 3 directions • Only few memory affordable for proxies • Usually only 28 directions per proxy • With discretized normal and color: 66-344 MB for entire model
Proxy Quality – OverviewOutside View (2GB footprint) Immediately after startup (tiny fraction of data loaded) no proxies
Proxy Quality – OverviewOutside View (2GB footprint) Immediately after startup (tiny fraction of data loaded) no proxies with proxies
Proxy Quality – OverviewOutside View (2GB footprint) After loading for several seconds (roughly equal amount of geometry loaded) without proxies with proxies
Results: Proxies • Proxy Quality • Not as good as expected (sampling too coarse) • Still: Sufficient for navigating… • Immediate visual feedback after loading • … and at any time during interaction • Artifacts quickly disappear while loading • Only use proxies while data still missing
Results: Shadows • So far: Only concentrated on simple shading • Ray Tracing: Can easily add shadows • OOC memory management scheme and proxies completely transparent to secondary rays… • No details here… • Just show effect and importance of using shadows…
Results: Shadows • Ray Tracing: Can easily add shadows • Cost rather small (coherence: data already in cache) • Significantly improved „sense of depth“
Results: Shadows • Ray Tracing: Can easily add shadows • Cost rather small (coherence: data already in cache) • Significantly improved „sense of depth“
Results: Shadows • Ray Tracing: Can easily add shadows • Cost rather small (coherence: data already in cache) • Significantly improved „sense of depth“
Summary • Proposed OOC RT for Complex Models • Clever memory management • Plus proxies as replacements for missing data • Results • Fast visual feedback already during loading • Render full-res model once loaded • Achieve interactive fullscreen performance • 2-3fps @ 1280x1024 on single desktop PC • Including support for shadows
Future Work • Future work • Improve proxy quality • Cache-aware parallelization • Interactive lighting simulation in 777 • Acknowledgements • Boeing Corp • Our SysAdmin group
Rendering Performance • Use single desktop PC • AMD dual Opteron 1.8GHz PC • 6GB RAM • Rendering Performance • Outside view: 2-3 fps @ 1280x1024 • Even faster in closeups • Fullscreen performance on single PC !
Proxy Quality – Wheel Example • Without Proxies
Proxy Quality – Wheel Example • With Proxies
Proxy Quality – OverviewOutside View (2GB footprint) After loading for several seconds Vs full-scale model entire model with proxies without proxies
Motivation • Practical example: Max. model size at CGUdS • 2000/01: „Soda Hall“ (1.5 Mtri) • 2001/02: „UNC PowerPlant“ (12.5 Mtri) • 2002/03: „Sunflowers“ (~1,000 Mtri, instantiated) • 2003/04: „Boeing 777“ (350 Mtri, individual triangles) • Todays industrial CAD models (rule of thumb) • One car: 10+ MTri • One plane: 100+ Mtri • One cruise ship / factory / nuclear reactor: up to 1+GTri … • Scientific computing: • LLNL Isosurface: 270+ time slices, 470MTri / slice…
Motivation – Are the „complex models“ any more ? • Today: Steeply rising graphics performance • Faster Desktop PCs (3+GHz CPUs, 2+GB RAM) • Faster GPUs (100+ million tris/s) • Better Algorithms • Performance increase still ongoing • „Moore‘s Law“ : Performance doubles every 1.5 years… • For GPUs: Even faster growth than for CPUs • „Affordable“ model size steeply rising • What was a complex model 3 years ago can today often be rendered on a laptop…
Proxy Results • Proxy memory consumption • 28 directions, 2x2bytes for normal+color ~100bytes • Full model: • 66-344MB quite affordable on 6GB PC • Proxy Performance • No performance impact at all ! • Proxy access faster than tracing the ray
Motivation Problem: Model complexity rises even faster ! • Higher performance spent as soon as available • Best example: Games... • More detailed models • Car industry today: 2MTri for a steering wheel • Faster computers also drive user demands • Higher accuracy (structural analysis, FEM, …) • Finer tesselation • CAD/VR/DMU increasingly important in industry • One model edited by more and more users… • Each of which models at full accuracy…
OOC Memory Management • Now: Need to detect page faults before they happen • If not, access to data will stall thread until data available • Several possible options: • Detect using OS signals [deMarle, PGV04] • Very elegant solution • But: Can‘t easily cancel rays after signal was raised • Detect via checking mem availability („mincore“) • OS call too costly for every access • Our approach: Keep track of which data is in memory • Control what OS pages in and out