1 / 47

An Interactive Out-of-Core Framework for Visualizing Massively Complex Models

An Interactive Out-of-Core Framework for Visualizing Massively Complex Models. Ingo Wald MPI Informatik Andreas Dietrich, Philipp Slusallek Saarland University. Outline. Motivation Rendering complex models Our Challenge: The „Boeing 777“ model Our Approach

jock
Download Presentation

An Interactive Out-of-Core Framework for Visualizing Massively Complex Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Interactive Out-of-Core Framework for VisualizingMassively Complex Models Ingo Wald MPI Informatik Andreas Dietrich, Philipp Slusallek Saarland University

  2. Outline • Motivation • Rendering complex models • Our Challenge: The „Boeing 777“ model • Our Approach • Out-of-core ray tracing for massive models • Memory management scheme • Proxy mechanism for representing not-yet-loaded data • Results • Conclusion and Future Work

  3. Motivation – Are there „complex models“ any more ? • Today: Steeply rising graphics performance • Faster GPUs (100+ million tris/s) • „Moore‘s Law“ : Performance doubles every 1.5 years… • But: Model complexity rising (at least) as fast • Higher performance spent as soon as available • Best example: Games... • CAD&VR used for ever larger engineering projects • Collaboration of more and more designers • Each of which models „his part“ at full accuracy… • Immensely complex models

  4. Today‘s Challenge: The „Boeing 777“ – 350M Triangles

  5. Complex Models: Previous Work • Brute force rasterization • Use fastest available graphics hardware • PC GFX cards today: ~100MTri/s • Even at theoretical peak performance several sec. per frame • Usually try to reduce #triangles to be rendered • Mesh simplification • Edge collapse, vertex removal, remeshing, etc. • Often requires „useful“ input meshes

  6. Complex Models: Previous Work • Occlusion Culling • Visibility preprocessing (from region, from point, etc) • Hierarchical Z-Buffer • Can only helf if there is enough occlusion • System solutions (MMR, GigaWalk, iWalk) • Build on combination of ideas • Visibility precomputation + occlusion culling + LODs + … • Problem: Individual techniques already problematic • Complex precomputation and data structures • Often suffers from artefacts (popping etc)

  7. Complex Models: Previous Work • QSplat • Hierarchical point-sampled representation • Best for locally smooth meshes • Problematic for high depth complexity • Randomized Z-Buffer • Randomly selects subset of triangles to be rendered • Best for almost-random data (tree leaves etc.) • Several Others • Impostors, Textured depth meshes, …

  8. Today‘s Challenge: The „Boeing 777“ • 350 million triangles • 12 GB just for vertex positions, 35-70GB incl BSPs • Complex geometrical structure • Unstructured „soup“ of triangles • Often self-intersecting, coplanar, and badly shaped • Complex interwoven parts like pipes, cables, … • Low degree of occlusion • Goal: Render interactively on single PC • Dual-Opteron 246 (1.8GHz) w/ 6GB RAM (or less)

  9. Today‘s Challenge: The „Boeing 777“

  10. Today‘s Challenge: The „Boeing 777“

  11. Today‘s Challenge: The „Boeing 777“

  12. Today‘s Challenge: The „Boeing 777“ Same complexity all over the model…

  13. Problems in the 777 • Unorganized „soup“ of triangles

  14. Problems in the 777 • Complex, interwoven geometry • Problematic for simplification-style algorithms • High depth complexity

  15. Problems in the 777 • Low degree of occlusion • Visibility Preprocessing / Occlusion Culling ? • Even perfect occlusion culling generates millions of tris…

  16. Complex Models: Previous Work • Conclusion • Most previous approaches problematic for 777-style models • Note: Same problem with most real-world CAD models • Need another approach…

  17. Complex Models: Previous Work • Conclusion • Most previous approaches problematic for 777-style models • Note: Same problem with most real-world CAD models • Need another approach… • Idea: Ray Tracing logarithmic in #triangles • Ideal for complex models

  18. Basic Idea: Ray Tracing Ideal for Complex Models… • Proof by example: Sunflowers model … • 1 billion triangles (3x the 777) • Interactive performance on OpenRT engine [Wald03] • Even including shadows, transparency, textures, etc… • Are 350 million still a problem ?

  19. Basic Idea: Ray Tracing Ideal for Complex Models… • Caveat • Sunflowers uses instantiation  easily fits into <1GB RAM • 777: individual triangles  35-70GB data • First test: On SUN SunFire 12k w/ 96 GB RAM • Not a problem – it just works… • On desktop PC: • Typically 2 to (at most) 8 GB RAM • Need out-of-core (OOC) mechanism

  20. OOC Ray Tracing • Pharr 1997: Memory Coherent Ray Tracing • Manual caching of scene geometry • Extensive reordering of rays to minimize disk I/O • But: Only for offline rendering • Wald 2001: Interactive OOC Ray Tracing • Same idea as MCRT, but interactive • Caching on „chunks“ of ~1500 triangles • Minimal reordering: Only to hide loading latency • Assumed that all missing data can be loaded every frame • Only few cache misses tolerable

  21. OOC Ray Tracing • Loading all missing data every frame ? • Lots of data required even for small camera movement • Loading all missing data in same frame not possible • Must tolerate that some rays must be cancelled(due to lack of data)  Need to cancel „faulting“ rays • Need to detect which scene access will stall • OOC Memory management • Need to find replacement color for cancelled ray • Geometry Proxies

  22. OOC Memory Management • Lessons learned from [Wald01] • Streaming precomputation good • Object-based caching on 1500-triangle-blocks not good • Extensive replication when generating 1500tri-blocks • Fragmentation (both internal and external) • Bad cache granularity • Memory management and data handling quite costly • Better: Use tile-based caching à la Linux • Build large BSP on disk (streaming preprocess) • „mmap“ into 64bit-address space • Let CPU and OS do I/O and address translation • But: Need to avoid page faults

  23. OOC Memory Management • Tile table: Stores which tiles are in memory • Organized as hash-table for efficient access • Requires only few kilobytes • Each lookup costs only few bit-operations and compares • On „cache“ miss • Cancel faulting ray before access  avoid OS page fault • Put tile ID into request queue • Page in tile asynchronously in „tile fetcher thread“ • If memory is fully used • Asynchronously evict tiles using „second chance“ • Control what is paged in and out at what time • Avoid any stalls of the rendering threads

  24. OOC Memory Management • So far • Can efficiently detect and avoid page faults • And asynchronously load missing data • Render at full accuracy once all data is available • Performance • 2-3 fps @ 1280x1024 • Single PC

  25. OOC Memory Management • So far • Can efficiently detect and avoid page faults • And asynchronously load missing data • Render at full accuracy once all data is available • Performance • 2-3 fps @ 1280x1024 • Single PC • Question: What to dowith cancelled rays ? (marked red here)

  26. Cancelling rays • Our approach: Shade ray using „proxies“ • Build proxy for each subtree addressed by pointer crossing tile boundaries • Precomputation: Sample subtree‘s volume with rays • Record shading information (normal and material properties) • Store information for several discretized directions • Similar to LightField • For faulting ray during rendering: Fetch corresponding proxy • Interpolate shading information from closest 3 directions • Only few memory affordable for proxies • Usually only 28 directions per proxy • With discretized normal and color: 66-344 MB for entire model

  27. Proxy Quality – OverviewOutside View (2GB footprint) Immediately after startup (tiny fraction of data loaded) no proxies

  28. Proxy Quality – OverviewOutside View (2GB footprint) Immediately after startup (tiny fraction of data loaded) no proxies with proxies

  29. Proxy Quality – OverviewOutside View (2GB footprint) After loading for several seconds (roughly equal amount of geometry loaded) without proxies with proxies

  30. Results: Proxies • Proxy Quality • Not as good as expected (sampling too coarse) • Still: Sufficient for navigating… • Immediate visual feedback after loading • … and at any time during interaction • Artifacts quickly disappear while loading • Only use proxies while data still missing

  31. Results: Shadows • So far: Only concentrated on simple shading • Ray Tracing: Can easily add shadows • OOC memory management scheme and proxies completely transparent to secondary rays… • No details here… • Just show effect and importance of using shadows…

  32. Results: Shadows • Ray Tracing: Can easily add shadows • Cost rather small (coherence: data already in cache) • Significantly improved „sense of depth“

  33. Results: Shadows • Ray Tracing: Can easily add shadows • Cost rather small (coherence: data already in cache) • Significantly improved „sense of depth“

  34. Results: Shadows • Ray Tracing: Can easily add shadows • Cost rather small (coherence: data already in cache) • Significantly improved „sense of depth“

  35. Summary • Proposed OOC RT for Complex Models • Clever memory management • Plus proxies as replacements for missing data • Results • Fast visual feedback already during loading • Render full-res model once loaded • Achieve interactive fullscreen performance • 2-3fps @ 1280x1024 on single desktop PC • Including support for shadows

  36. Future Work • Future work • Improve proxy quality • Cache-aware parallelization • Interactive lighting simulation in 777 • Acknowledgements • Boeing Corp • Our SysAdmin group

  37. Questions ?

  38. Today‘s Challenge: The „Boeing 777“

  39. Rendering Performance • Use single desktop PC • AMD dual Opteron 1.8GHz PC • 6GB RAM • Rendering Performance • Outside view: 2-3 fps @ 1280x1024 • Even faster in closeups • Fullscreen performance on single PC !

  40. Proxy Quality – Wheel Example • Without Proxies

  41. Proxy Quality – Wheel Example • With Proxies

  42. Proxy Quality – OverviewOutside View (2GB footprint) After loading for several seconds Vs full-scale model entire model with proxies without proxies

  43. Motivation • Practical example: Max. model size at CGUdS • 2000/01: „Soda Hall“ (1.5 Mtri) • 2001/02: „UNC PowerPlant“ (12.5 Mtri) • 2002/03: „Sunflowers“ (~1,000 Mtri, instantiated) • 2003/04: „Boeing 777“ (350 Mtri, individual triangles) • Todays industrial CAD models (rule of thumb) • One car: 10+ MTri • One plane: 100+ Mtri • One cruise ship / factory / nuclear reactor: up to 1+GTri … • Scientific computing: • LLNL Isosurface: 270+ time slices, 470MTri / slice…

  44. Motivation – Are the „complex models“ any more ? • Today: Steeply rising graphics performance • Faster Desktop PCs (3+GHz CPUs, 2+GB RAM) • Faster GPUs (100+ million tris/s) • Better Algorithms • Performance increase still ongoing • „Moore‘s Law“ : Performance doubles every 1.5 years… • For GPUs: Even faster growth than for CPUs • „Affordable“ model size steeply rising • What was a complex model 3 years ago can today often be rendered on a laptop…

  45. Proxy Results • Proxy memory consumption • 28 directions, 2x2bytes for normal+color  ~100bytes • Full model: • 66-344MB quite affordable on 6GB PC • Proxy Performance • No performance impact at all ! • Proxy access faster than tracing the ray

  46. Motivation Problem: Model complexity rises even faster ! • Higher performance spent as soon as available • Best example: Games... • More detailed models • Car industry today: 2MTri for a steering wheel • Faster computers also drive user demands • Higher accuracy (structural analysis, FEM, …) • Finer tesselation • CAD/VR/DMU increasingly important in industry • One model edited by more and more users… • Each of which models at full accuracy…

  47. OOC Memory Management • Now: Need to detect page faults before they happen • If not, access to data will stall thread until data available • Several possible options: • Detect using OS signals [deMarle, PGV04] • Very elegant solution • But: Can‘t easily cancel rays after signal was raised • Detect via checking mem availability („mincore“) • OS call  too costly for every access • Our approach: Keep track of which data is in memory • Control what OS pages in and out

More Related