350 likes | 462 Views
A Dynamic Adaptive Multi-resolution GPU Data Structure Adaptive Shadow Maps, Octree 3D Paint, Adaptive PDE Solver. Aaron Lefohn University of California, Davis. Problem Statement. Goal Dynamic, adaptive, multi-resolution GPU data structure Efficient read, data-write, structure change
E N D
A Dynamic Adaptive Multi-resolution GPU Data StructureAdaptive Shadow Maps, Octree 3D Paint, Adaptive PDE Solver Aaron Lefohn University of California, Davis
Problem Statement • Goal • Dynamic, adaptive, multi-resolution GPU data structure • Efficient read, data-write, structure change • Adaptive shadow maps, octree 3D paint, adaptive PDE solver • Challenges • All operations must be data-parallel • Trees difficult to update and cause incoherent accesses • Solution • Leverage virtual memory research from architecture • Page-table based structure • Decouple levels of indirection from resolution levels • Easy implementation with the Glift template library
Collaborators • Joe KnissUniversity of Utah • Robert StrzodkaCAESAR Research Institute • Shubhabrata SenguptaUniversity of California, Davis • John OwensUniversity of California, Davis
Assumptions • This talk heavily relies on the contents of the “Glift” generic data structure talk
Is This GPGPU Programming? • Yes • Inseparable mix of GPGPU stream programming and traditional graphics • High-quality interactive rendering • Updating complex GPU data structures
Previous Work • Binotto et al. • Carr et al. • Coombe et al. • Ertl et al. • Lefebvre et al. • Purcell et al.
Why A New Structure? • What’s Missing? • Fully GPU-based adaptive multi-resolution structure • GPU based address translator • GPU based updates of address translator • Trilinear/Quadlinear mipmap filtering support • Uniform, coherent memory accesses
Application Adaptive Shadow Maps • Fernando et al., ACM SIGGRAPH 2001 • Elegant solution to shadow map aliasing • Quadtree of small shadow maps • Shadow maps need resolution only on shadow boundary • Required resolution determined by projected area of screen space pixel into light space
Application Adaptive Shadow Maps • Why Adaptive Shadow Maps? • Many recent (2004) shadow papers cite ASMs as high quality solution but not possible on graphics hardware • Algorithm is simple. Data structure is hard.
Application ASM Data Structure Requirements • Adaptive • Multiresolution • Fast, parallel random-access read • 2x2 native Percentage Closer Filtering (PCF) • Trilinear interpolated mipmapped PCF • Fast, parallel write • Fast, parallel insert and erase
Application ASM Data Structure • Start with page table address translator • Coarse, uniform discretization of virtual domain • O(N) memory O(1) insert • O(1) computation O(1) erase • Uniform consistency • Partial mapping (sparse)
ppa = pageTable(vpn) vpn = va / pageSize off = va % pageSizepa = ppa + off Application ASM Data Structure • Page table example Virtual Domain Page Table Physical Memory
Application ASM Data Structure Requirements • Adaptive • Multiresolution • Fast, parallel random-access read • 2x2 native Percentage Closer Filtering (PCF) • Trilinear interpolated mipmapped PCF • Fast, parallel write • Fast, parallel insert and erase
ppa = pageTable(vpn).ppa() vpn = va / pageSize s = pageTable(vpn).s()off = (va * s) % pageSizepa = ppa + off Application ASM Data Structure • Adaptive Page Table • Map multiple virtual pages to single physical page Virtual Domain Page Table Physical Memory
Application ASM Data Structure Requirements • Adaptive • Multiresolution • Fast, parallel random-access read • 2x2 native Percentage Closer Filtering (PCF) • Trilinear interpolated mipmapped PCF • Fast, parallel write • Fast, parallel insert and erase
Application ASM Data Structure • Multiresolution Page Table MipmapPage Table Virtual Domain Physical Memory
Application ASM Data Structure Requirements • Adaptive • Multiresolution • Fast, parallel random-access read • 2x2 native Percentage Closer Filtering (PCF) • Trilinear interpolated mipmapped PCF • Fast, parallel write • Fast, parallel insert and erase
Application ASM Data Structure Requirements • How support bilinear filtering? • Duplicate 1 column and 1 row of texels in each page • Mipmapped trilinear? • “By-hand” interpolation between mipmap levels
Application ASM Data Structure Requirements • Adaptive • Multiresolution • Fast, parallel random-access read • 2x2 native Percentage Closer Filtering (PCF) • Trilinear interpolated mipmapped PCF • Fast, parallel write • Fast, parallel insert and erase
Application How Define ASM Structure in Glift? • Start with generic page table AddrTrans • Use mipmapped PhysMem for page table • Change template parameter to add adaptivity • Write page allocator • alloc_pages, free_pages • Finally… typedef PageTableAddrTrans<…> PageTable;typedef PhysMemGPU<vec2f, vec1s> PMem2D;typedef VirtMemGPU<PageTable, PMem2D> VPageTable;typedef AdaptiveMem<VPageTable, PageAllocator> ASM;
Application ASM Data Structure Usage float4 main( uniformVMem2D asm, float3 shadowCoord, float4 litColor ) : COLOR { float isInLight = asm.vTex2Ds( shadowCoord ); return lerp( black, litColor, isInLight ); } asm.bind_for_read( … ); asm.bind_for_write( … ); asm.alloc_pages( … ); asm.free_page( … ); …
Application Adaptive Shadow Map Algorithm • Faithful to Fernando et al. 2001 • Refinement algorithm • Identify shadow pixels w/ resolution mismatch (GPU) • Compact pixels into small stream (GPU) • See “Stream Reduction Operations for GPGPU Applications” Daniel Horn, GPU Gems II, Ch. 36 • CPU reads back compacted stream (GPUCPU) • Allocate pages • Draw new PTEs into mipmap page tables (CPUGPU) • Draw depth into ASM for each new page (GPU)
ASM: Effective resolution 131,0722 (37 MB); SM: 20482 [Thanks to Yong Kil for the tree model]
Application “Octree” 3D Paint • Interactive painting on unparameterized 3D surfaces • 3D version of ASM data structure • Differs from previous work: • Quadrilinear filtering • O(1), uniform access • Interactive witheffectiveresolutionsbetween643 and 20483
Application ASM Results • Effective shadow map resolution up to 131,0722 162 - 642 page size5122 - 20482 page table20482 - 40962physical memory20 - 80 MB • Performance (45k polygon model) • 15 fps while moving camera (including refinement) • 5-10 fps while moving light • Lookup time compared to 20482 shadow map: • Bilinear filtered: 90% performance of traditional • Trilinear filtered mipmapped: 73%
ASM Results • Bottlenecks and Limitations • Stream compaction (50% - 85% of frame rate) • Missing frustum culling (1-2fps w/1M poly model) • Need framerate guarantee • Missing large PCFs • Better cache replacement strategy • 2-level page table
Page Table Memory Coherency • 1- and 2-level page tables bandwidth bound below 8 x 8 page RGBA8 textures, NVIDIA GeForce 6800 GT, NVIDIA driver 75.22, Cg 1.4a
Application Static Instruction Counts • Static instruction results • With Cg program specialization Glift By-Hand • ASM 9 9 • Octree 10 9 • ASM + offset 10 9 • Conclusion : Glift structures within 1 instr of hand-coded Cg Measured with NVShaderPerf, NVIDIA driver 75.22, Cg 1.4a
Overview • Motivation and previous work • Abstraction • Glift template library • Case study • Adaptive shadow maps and octree 3D paint • Conclusions
Conclusions • Dynamic adaptive multires data structure • Coherent accesses if pages are larger than 8 x 8 • Decouple levels of indirection from levels of resolution • Page table literature • Continuum all the way from 1-level to full tree • Based on assumption that accesses are coherent within page
Conclusions • Adaptive Shadow Maps • Interactive camera movements at 10 – 20fps • Can rebuild entire ASM each frame at 5 – 10fps • Trilinear mipmap filtering removes “popping” • Support up to 131,0722 with 1 level of indirection • Add levels of indirection to reduce page table size, reduce page size, or get higher effective resolutions • How fast can ASMs get with full optimizations and hardware support for stream compaction?
Conclusions • Octree 3D Paint • Fully interactive 3D painting on unparameterized models • Frame rates bound by geometry, not octree texture • Support up to 20483 with 1 level of indiretion • Add indirections to reduce page table memory or support higher effective resolutions • Quadlinear filtering on octree 3D texture • Future : Add support for “normal flags”
Acknowledgements • Craig Kolb, Nick Triantos NVIDIA • Fabio Pellacini Cornell/Pixar • Adam Moerschell, Yong Kil UCDavis Serban Porumbescu, Chris Co, …. • Ross Whitaker, Chuck Hansen, Milan Ikits U. of Utah • National Science Foundation • Department of Energy
More Information • Upcoming ACM Transactions on Graphics paper • “Glift : An Abstraction for Generic, Efficient GPU Data Structures” • Predicted, October 2005 • ASM and Octree submitted as SIGGRAPH sketches • Google “Lefohn GPU”