An Interactive Out-of-Core Framework for Visualizing Massively Complex Models

An Interactive Out-of-Core Framework for VisualizingMassively Complex Models Ingo Wald MPI Informatik Andreas Dietrich, Philipp Slusallek Saarland University

Outline • Motivation • Rendering complex models • Our Challenge: The „Boeing 777“ model • Our Approach • Out-of-core ray tracing for massive models • Memory management scheme • Proxy mechanism for representing not-yet-loaded data • Results • Conclusion and Future Work

Motivation – Are there „complex models“ any more ? • Today: Steeply rising graphics performance • Faster GPUs (100+ million tris/s) • „Moore‘s Law“ : Performance doubles every 1.5 years… • But: Model complexity rising (at least) as fast • Higher performance spent as soon as available • Best example: Games... • CAD&VR used for ever larger engineering projects • Collaboration of more and more designers • Each of which models „his part“ at full accuracy… • Immensely complex models

Today‘s Challenge: The „Boeing 777“ – 350M Triangles

Complex Models: Previous Work • Brute force rasterization • Use fastest available graphics hardware • PC GFX cards today: ~100MTri/s • Even at theoretical peak performance several sec. per frame • Usually try to reduce #triangles to be rendered • Mesh simplification • Edge collapse, vertex removal, remeshing, etc. • Often requires „useful“ input meshes

Complex Models: Previous Work • Occlusion Culling • Visibility preprocessing (from region, from point, etc) • Hierarchical Z-Buffer • Can only helf if there is enough occlusion • System solutions (MMR, GigaWalk, iWalk) • Build on combination of ideas • Visibility precomputation + occlusion culling + LODs + … • Problem: Individual techniques already problematic • Complex precomputation and data structures • Often suffers from artefacts (popping etc)

Complex Models: Previous Work • QSplat • Hierarchical point-sampled representation • Best for locally smooth meshes • Problematic for high depth complexity • Randomized Z-Buffer • Randomly selects subset of triangles to be rendered • Best for almost-random data (tree leaves etc.) • Several Others • Impostors, Textured depth meshes, …

Today‘s Challenge: The „Boeing 777“ • 350 million triangles • 12 GB just for vertex positions, 35-70GB incl BSPs • Complex geometrical structure • Unstructured „soup“ of triangles • Often self-intersecting, coplanar, and badly shaped • Complex interwoven parts like pipes, cables, … • Low degree of occlusion • Goal: Render interactively on single PC • Dual-Opteron 246 (1.8GHz) w/ 6GB RAM (or less)

Today‘s Challenge: The „Boeing 777“

Today‘s Challenge: The „Boeing 777“ Same complexity all over the model…

Problems in the 777 • Unorganized „soup“ of triangles

Problems in the 777 • Complex, interwoven geometry • Problematic for simplification-style algorithms • High depth complexity

Problems in the 777 • Low degree of occlusion • Visibility Preprocessing / Occlusion Culling ? • Even perfect occlusion culling generates millions of tris…

Complex Models: Previous Work • Conclusion • Most previous approaches problematic for 777-style models • Note: Same problem with most real-world CAD models • Need another approach…

Complex Models: Previous Work • Conclusion • Most previous approaches problematic for 777-style models • Note: Same problem with most real-world CAD models • Need another approach… • Idea: Ray Tracing logarithmic in #triangles • Ideal for complex models

Basic Idea: Ray Tracing Ideal for Complex Models… • Proof by example: Sunflowers model … • 1 billion triangles (3x the 777) • Interactive performance on OpenRT engine [Wald03] • Even including shadows, transparency, textures, etc… • Are 350 million still a problem ?

Basic Idea: Ray Tracing Ideal for Complex Models… • Caveat • Sunflowers uses instantiation  easily fits into <1GB RAM • 777: individual triangles  35-70GB data • First test: On SUN SunFire 12k w/ 96 GB RAM • Not a problem – it just works… • On desktop PC: • Typically 2 to (at most) 8 GB RAM • Need out-of-core (OOC) mechanism

OOC Ray Tracing • Pharr 1997: Memory Coherent Ray Tracing • Manual caching of scene geometry • Extensive reordering of rays to minimize disk I/O • But: Only for offline rendering • Wald 2001: Interactive OOC Ray Tracing • Same idea as MCRT, but interactive • Caching on „chunks“ of ~1500 triangles • Minimal reordering: Only to hide loading latency • Assumed that all missing data can be loaded every frame • Only few cache misses tolerable

OOC Ray Tracing • Loading all missing data every frame ? • Lots of data required even for small camera movement • Loading all missing data in same frame not possible • Must tolerate that some rays must be cancelled(due to lack of data)  Need to cancel „faulting“ rays • Need to detect which scene access will stall • OOC Memory management • Need to find replacement color for cancelled ray • Geometry Proxies

OOC Memory Management • Lessons learned from [Wald01] • Streaming precomputation good • Object-based caching on 1500-triangle-blocks not good • Extensive replication when generating 1500tri-blocks • Fragmentation (both internal and external) • Bad cache granularity • Memory management and data handling quite costly • Better: Use tile-based caching à la Linux • Build large BSP on disk (streaming preprocess) • „mmap“ into 64bit-address space • Let CPU and OS do I/O and address translation • But: Need to avoid page faults

OOC Memory Management • Tile table: Stores which tiles are in memory • Organized as hash-table for efficient access • Requires only few kilobytes • Each lookup costs only few bit-operations and compares • On „cache“ miss • Cancel faulting ray before access  avoid OS page fault • Put tile ID into request queue • Page in tile asynchronously in „tile fetcher thread“ • If memory is fully used • Asynchronously evict tiles using „second chance“ • Control what is paged in and out at what time • Avoid any stalls of the rendering threads

OOC Memory Management • So far • Can efficiently detect and avoid page faults • And asynchronously load missing data • Render at full accuracy once all data is available • Performance • 2-3 fps @ 1280x1024 • Single PC

OOC Memory Management • So far • Can efficiently detect and avoid page faults • And asynchronously load missing data • Render at full accuracy once all data is available • Performance • 2-3 fps @ 1280x1024 • Single PC • Question: What to dowith cancelled rays ? (marked red here)

Cancelling rays • Our approach: Shade ray using „proxies“ • Build proxy for each subtree addressed by pointer crossing tile boundaries • Precomputation: Sample subtree‘s volume with rays • Record shading information (normal and material properties) • Store information for several discretized directions • Similar to LightField • For faulting ray during rendering: Fetch corresponding proxy • Interpolate shading information from closest 3 directions • Only few memory affordable for proxies • Usually only 28 directions per proxy • With discretized normal and color: 66-344 MB for entire model

Proxy Quality – OverviewOutside View (2GB footprint) Immediately after startup (tiny fraction of data loaded) no proxies

Proxy Quality – OverviewOutside View (2GB footprint) Immediately after startup (tiny fraction of data loaded) no proxies with proxies

Proxy Quality – OverviewOutside View (2GB footprint) After loading for several seconds (roughly equal amount of geometry loaded) without proxies with proxies

Results: Proxies • Proxy Quality • Not as good as expected (sampling too coarse) • Still: Sufficient for navigating… • Immediate visual feedback after loading • … and at any time during interaction • Artifacts quickly disappear while loading • Only use proxies while data still missing

Results: Shadows • So far: Only concentrated on simple shading • Ray Tracing: Can easily add shadows • OOC memory management scheme and proxies completely transparent to secondary rays… • No details here… • Just show effect and importance of using shadows…

Results: Shadows • Ray Tracing: Can easily add shadows • Cost rather small (coherence: data already in cache) • Significantly improved „sense of depth“

Summary • Proposed OOC RT for Complex Models • Clever memory management • Plus proxies as replacements for missing data • Results • Fast visual feedback already during loading • Render full-res model once loaded • Achieve interactive fullscreen performance • 2-3fps @ 1280x1024 on single desktop PC • Including support for shadows

Future Work • Future work • Improve proxy quality • Cache-aware parallelization • Interactive lighting simulation in 777 • Acknowledgements • Boeing Corp • Our SysAdmin group

Questions ?

Today‘s Challenge: The „Boeing 777“

Rendering Performance • Use single desktop PC • AMD dual Opteron 1.8GHz PC • 6GB RAM • Rendering Performance • Outside view: 2-3 fps @ 1280x1024 • Even faster in closeups • Fullscreen performance on single PC !

Proxy Quality – Wheel Example • Without Proxies

Proxy Quality – Wheel Example • With Proxies

Proxy Quality – OverviewOutside View (2GB footprint) After loading for several seconds Vs full-scale model entire model with proxies without proxies

Motivation • Practical example: Max. model size at CGUdS • 2000/01: „Soda Hall“ (1.5 Mtri) • 2001/02: „UNC PowerPlant“ (12.5 Mtri) • 2002/03: „Sunflowers“ (~1,000 Mtri, instantiated) • 2003/04: „Boeing 777“ (350 Mtri, individual triangles) • Todays industrial CAD models (rule of thumb) • One car: 10+ MTri • One plane: 100+ Mtri • One cruise ship / factory / nuclear reactor: up to 1+GTri … • Scientific computing: • LLNL Isosurface: 270+ time slices, 470MTri / slice…

Motivation – Are the „complex models“ any more ? • Today: Steeply rising graphics performance • Faster Desktop PCs (3+GHz CPUs, 2+GB RAM) • Faster GPUs (100+ million tris/s) • Better Algorithms • Performance increase still ongoing • „Moore‘s Law“ : Performance doubles every 1.5 years… • For GPUs: Even faster growth than for CPUs • „Affordable“ model size steeply rising • What was a complex model 3 years ago can today often be rendered on a laptop…

Proxy Results • Proxy memory consumption • 28 directions, 2x2bytes for normal+color  ~100bytes • Full model: • 66-344MB quite affordable on 6GB PC • Proxy Performance • No performance impact at all ! • Proxy access faster than tracing the ray

Motivation Problem: Model complexity rises even faster ! • Higher performance spent as soon as available • Best example: Games... • More detailed models • Car industry today: 2MTri for a steering wheel • Faster computers also drive user demands • Higher accuracy (structural analysis, FEM, …) • Finer tesselation • CAD/VR/DMU increasingly important in industry • One model edited by more and more users… • Each of which models at full accuracy…

OOC Memory Management • Now: Need to detect page faults before they happen • If not, access to data will stall thread until data available • Several possible options: • Detect using OS signals [deMarle, PGV04] • Very elegant solution • But: Can‘t easily cancel rays after signal was raised • Detect via checking mem availability („mincore“) • OS call  too costly for every access • Our approach: Keep track of which data is in memory • Control what OS pages in and out

An Interactive Out-of-Core Framework for Visualizing Massively Complex Models

An Interactive Out-of-Core Framework for Visualizing Massively Complex Models

Presentation Transcript

Interactive Distributed Ray Tracing of Highly Complex Models

An Alternative Framework for Task-based Instruction: Core

Models of Instruction Interactive PowerPoint

An Out-of-core Algorithm for Isosurface Topology Simplification

Basic Models of Complex Networks

An Interactive Java Profiler for the F2F Computing Framework

Title: GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes

Hierarchical Occluders for Interactive Rendering of Large Models

An animated and interactive interface for visualizing dynamics of air quality change

Strategic Framework for GP Out-of-Hours

Metamorphic core complex

On-Chip Hardware Checking for an Out-of-Order Core

An Interactive Framework for Raster Data Spatial Joins

Sensitivity Analysis for Complex Models

Plotting Complex Models

An Idiom Recognition Framework for Exploiting Complex Hardware Instructions

OpenSoundEdit: An Interactive Visualization and Editing Framework for Timbral Resources

Why out-of-core?

Out of Core Simplification

Interactive Models for Design of Software-Intensive Systems

An AOP Implementation Framework for Extending Join Point Models

Models of Instruction Interactive PowerPoint