1 / 67

Glift: An Abstraction for Generic, Efficient GPU Data Structures

Glift: An Abstraction for Generic, Efficient GPU Data Structures. Aaron Lefohn University of California, Davis. Problem Statement. Goal Simplify creation and use of random-access GPU data structures for graphics and GPGPU programming Challenges GPU memory model different than CPU

anisa
Download Presentation

Glift: An Abstraction for Generic, Efficient GPU Data Structures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Glift: An Abstraction for Generic, Efficient GPU Data Structures Aaron Lefohn University of California, Davis

  2. Problem Statement • Goal • Simplify creation and use of random-access GPU data structures for graphics and GPGPU programming • Challenges • GPU memory model different than CPU • Memory model spans multiple processors and languages • Efficiency • Solution • Abstraction for GPU data structures • Glift template library

  3. Collaborators • Joe KnissUniversity of Utah • Robert StrzodkaCAESAR Research Institute • Shubhabrata SenguptaUniversity of California, Davis • John OwensUniversity of California, Davis

  4. Abstraction Spoiler… • GPU data structures are not as complex as they appear • A large number of GPU data structures can be easily understood if the following three items are specified • Virtual address domain • Physical address domain • Address translator

  5. Overview • Motivation and Previous Work • Abstraction • Glift template library • Case study • Adaptive shadow maps and octree 3D paint • Conclusions

  6. Motivation CPU Data Structures • Why? • Algorithms expressed in natural data domain • Decouple algorithms and data structures • Reuse Virtual representation of memory: N-D array, stack, hash table, queue, … Abstractions provided by library (STL, Boost, …) Physical representation of memory: 1D array

  7. Motivation GPU State of the Art • GPU code accesses physical memory directly • GPU code is tangled mess of data structure access and algorithms • No reuse • Greatly complicates GPU algorithm design

  8. Abstraction We Want To Transform This… float3 getAddr3D( float2 winPos, float2 winSize, float3 sizeConst3D ) { float3 curAddr3D;float2 winPosInt = floor(winPos);float addr1D = winPosInt.y * winSize.x + winPosInt.x; addr3D.z = floor( addr1D / sizeConst3D.z ); addr1D -= addr3D.z * sizeConst3D.z; addr3D.y = floor( addr1D / sizeConst3D.y ); addr3D.x = addr1D - addr3D.y * sizeConst3D.y;return addr3D; } float2 getAddr2D( float3 addr3D, float2 winSize, float3 sizeConst3D ) { float addr1D = dot( addr3D, sizeConst3D );float normAddr1D = addr1D / winSize.x;returnfloat2(frac(normAddr1D) * winSize.x, normAddr1D); } float3 main( uniform samplerRECT data, uniform float2 winSize, uniformfloat3 sizeConst3D, float2 winPos : WPOS ) : COLOR { float3hereAddr3D = getAddr3D( winPos, winSize, sizeConst3D ); float3 neighborAddr = hereAddr3D - float3(1, 1, 1); returntexRECT(data, getAddr2D(neighborAddr3D, winSize, sizeConst3D) ); }

  9. Abstraction Into This. void main( uniformVMem3D data, Iterator3Diter, out float result ) { float3 va = iter.addr(); return srcData.vTex3D( va – float3(1,1,1) ); }

  10. Previous Work Previous Work • GPU computation • GPU data structures

  11. Previous Work Shader/Kernel Languages • Vertex and fragment assembly 2001- 2002 • Real-Time Shading Language 2001 • Cg, HLSL, GLSL, Ashli 2002-2004

  12. Previous Work Higher-Level Languages • Abstract kernels, streams, and “glue” • Brook 2003 • Sh 2004 • Scout 2004

  13. Previous Work Previous Work • GPU computation • GPU data structures

  14. Previous Work Virtualized GPU Data Structures • Brook • Virtualizes CPU/GPU interface for 1D – 4D arrays • Sh • Virtualizes 1D arrays and CPU/GPU data access

  15. Previous Work Example Brook Data Structure • kernelvoid myKernel( float srcData[], • out float dstData<> ) • { • result = input[ i – 1 ]; } float srcData<5000>, dstData<5000>; streamRead(srcData, dataPtr ); myKernel( srcData, dstData ); streamWrite(dstData, dataPtr );

  16. Previous Work Limitations of Brook Data Virtualization • Difficult to add new, equivalently virtualized structures • Requires adopting entire system • Only exposes GL_NEAREST, GL_TEXTURE_2D

  17. Previous Work What about other Data Structures? • Photon map Purcell • Sparse matrix Boltz, Krueger • Sparse simulation grid Lefohn • Polycube (3D grid, cubeMap, …) Tarini • N-tree Lefebvre • …

  18. Brook Sh STL ??? C++ Cg OpenGL Motivation GPU Data Structures • What’s Missing? • Standalone abstraction for GPU data structures for graphics or GPGPU programming

  19. Overview • Motivation and Previous Work • Abstraction • Glift template library • Case study • Adaptive shadow maps and octree 3D paint • Conclusions

  20. Why a Data Structure Abstraction? • Separate data structures and algorithms • Enable more complex structures • Enable more complex algorithms • Provide perspective on class of GPU-compatible structures • Is random read required? • What is required for stream read/write?

  21. Abstraction Abstraction Design Goals • GPU data structure abstraction that • Enables easy creation of new structures • Virtualizes CPU and GPU memory interfaces • Separates containers from algorithms • Encourages efficiency

  22. Abstraction Abstraction Design Approach • Minimal efficient abstraction of GPU memory model • GPU is different than CPU (1D/2D/3D/Cube/Mip) • Identify common patterns in GPU papers and code • Other inspiration • STL, Boost, STAPL, Stepanov • Brook

  23. Abstraction What is the GPU Memory Model? • Natively multi-dimensional • CPU interface • glTexImage malloc • glDeleteTextures free • glTexSubImage memcpy GPU -> CPU • glGetTexSubImage* memcpy CPU -> GPU • glCopyTexSubImage memcpy GPU -> GPU • glBindTexture read-only parameter bind • glFramebufferTexture write-only parameter bind * Does not exist. Emulate withglReadPixels

  24. Abstraction What is the GPU Memory Model? • GPU Interface (shown in Cg) • uniform samplerND parameter declaration • texND(tex, addr) random-access read • streamND(tex)* stream read * Does not exist, but is a useful construct for efficiency reasons

  25. Abstraction GPU Data Structure Abstraction • Factor GPU data structures into • Physical memory • Virtual memory • Address translator • Iterators

  26. Motivation Virtualized GPU Memory Model • Natively multi-dimensional • CPU interface • glTexImageC++ constructor • glDeleteTexturesC++ destructor • glTexSubImagewrite(orig, size, dataPtr) • glGetTexSubImage* read( orig, size, dataPtr) • glCopyTexSubImagecopy( orig, size, dst ) • glBindTexturebind_for_read( cgParam ) • glFramebufferTexturebind_for_write( attach ) * Does not exist. Emulate withglReadPixels

  27. Motivation Virtualized GPU Memory Model • GPU Interface (shown in Cg) • uniform samplerNDuniform VMemND • texND(tex, addr)vTexND(addr) • streamND(tex)*streamRead(iter) * Does not exist, but is a useful construct

  28. Abstraction Physical Memory • Native GPU textures • Choose based on algorithm efficiency requirements • 1D • Read-write, linear, 4096 max size • 2D • Read-write, bilinear, 40962 max size • 3D • Read-only, trilinear, 5123 max size • Cube • read-write, bilinear, square, array of six 2D textures • Mipmaps • Additional (multiresolution) dimension to address

  29. Translation Translation Translation 3D native mem 2D slices Flat 3D texture Abstraction Virtual Memory • Virtual N-D address space • Choose based on problem space of algorithm • Defined by physical memory and address translator Virtual representation of memory: 3D grid

  30. Abstraction Address Translator • Mapping between physical and virtual addrs • Core of data structure • Small amount of code defines all required C++ and Cg memory interfaces • Select based on virtual and physical domains and memory/compute efficiency requirements of algorithm

  31. Abstraction Address Translator Examples • Examples • ND-to-2D • 3D-to-2D tiled “flat 3D textures” • Page table • Grid of lists • Hash table • Silmap

  32. Abstraction Address Translator Classifications • Representation • Analytic / Discrete • Memory Complexity • O(1), O(log N), O(N), … • Compute Complexity • O(1), O(log N), O(N), … • Compute Consistency • Uniform vs. non-uniform • Total / Partial • Complete vs. sparse • One-to-one / Many-to-one • Uniform vs. adaptive

  33. Abstraction Iterators • Abstraction for virtual and physical addrs • CPU and GPU iterators • Boundary between containers and algorithms • Allows separate definition!

  34. Abstraction Which Iterators? • Forward iterator • Required for GPU execution • Enables stream read and stream write • Random access iterator • Required for indexing (i.e., texture-like structures) • Graphics needs random access. GPGPU often does not. • Others Coming… • Use iterators to explicitly declare access patterns

  35. Abstraction Simple Example • 3D Array with 2D physical memory CPU (C++) typedef boost::multi_array<float, 3> array_type; array_type srcData( boost::extents[10][10][10] ); array_type dstData( boost::extents[10][10][10] ); … initialize data … for (size_t z = 1; z < 10; ++z) { for (size_t y = 1; z < 10; ++y) { for (size_t x = 1; z < 10; ++x) { dstData[z][y][x] = srcData[z–1][y–1][x–1]; } } }

  36. Abstraction Example : GPU Shader w/out Abstraction float3 getAddr3D( float2 winPos, float2 winSize, float3 sizeConst3D ) { float3 curAddr3D;float addr1D = winPosInt.y * winSize.x + winPosInt.x; addr3D.z = floor( addr1D / sizeConst3D.z ); addr1D -= addr3D.z * sizeConst3D.z; addr3D.y = floor( addr1D / sizeConst3D.y ); addr3D.x = addr1D - addr3D.y * sizeConst3D.y;return addr3D; } float2 getAddr2D( float3 addr3D, float2 winSize, float3 sizeConst3D ) { float addr1D = dot( addr3D, sizeConst3D );float normAddr1D = addr1D / winSize.x;float2 neighborAddr2D = float2(frac(normAddr1D) * winSize.x, normAddr1D); } float3 main( uniform samplerRECT data, uniform float2 winSize, uniformfloat3 sizeConst3D, float2 winPos : WPOS ) : COLOR { float3hereAddr3D = getAddr3D( floor(winPos), winSize, sizeConst3D ); float3 neighborAddr = hereAddr3D - float3(1, 1, 1); returntexRECT(data, getAddr2D(neighborAddr3D, winSize, sizeConst3D) ); }

  37. Physical-to-Virtual Address Translation Virtual-to-PhysicalAddress Translation Physical Memory Read Abstraction Example : Rename Variables float3 physToVirt( float2 pa, float2 physSize, float3 virtSizes ) { float3 va;float addr1D = pa.y * physSize.x + pa.x; va.z = floor( addr1D / virtSizes.z ); addr1D -= va.z * sizeConst3D.z; va.y = floor( addr1D / virtSizes.y ); va.x = addr1D - va.y * virtSizes.y;return va; } float2 virtToPhys( float3 va, float2 physSize, float3 virtSizes ) { float addr1D = dot( va, virtSizes );float normAddr1D = addr1D / physSize.x;float2 pa = float2(frac(normAddr1D) * physSize.x, normAddr1D); } float3 main( uniform samplerRECT physMem, uniform float2 physSize, uniformfloat3 virtSizes, float2 pa : WPOS ) : COLOR { float3va = physToVirt( floor(pa), physSize, virtSizes ); float3 neighborAddr = va - float3(1, 1, 1); returntexRECT(data, virtToPhys(neighborAddr3D, physSize, virtSizes) ); }

  38. Iterator3D VMem3D VMem3D Abstraction Example : Glift Components float3 physToVirt( float2 pa, float2 physSize, float3 virtSizes ) { float3 va;float addr1D = pa.y * physSize.x + pa.x; va.z = floor( addr1D / virtSizes.z ); addr1D -= va.z * sizeConst3D.z; va.y = floor( addr1D / virtSizes.y ); va.x = addr1D - va.y * virtSizes.y;return va; } float2 virtToPhys( float3 va, float2 physSize, float3 virtSizes ) { float addr1D = dot( va, virtSizes );float normAddr1D = addr1D / physSize.x;float2 pa = float2(frac(normAddr1D) * physSize.x, normAddr1D); } float3 main( uniform samplerRECT physMem, uniform float2 physSize, uniformfloat3 virtSizes, float2 pa : WPOS ) : COLOR { float3va = physToVirt( floor(pa), physSize, virtSizes ); float3 neighborAddr = va - float3(1, 1, 1); returntexRECT(data, virtToPhys(neighborAddr3D, physSize, virtSizes) ); }

  39. Abstraction Example : GPU Shader with Glift Cg Usage float3 main( uniformVMem3D srcData, Iterator3Diter ) : COLOR { float3 va = iter.addr(); return srcData.vTex3D( va – float3(1,1,1) ); }

  40. Abstraction Example : Glift Data Structures C++ Usage vec3i origin(0,0,0); vec3i size(10,10,10); ArrayGpuND<vec3i,vec1f> srcData( size ); ArrayGpuND<vec3i,vec1f> dstData( size ); … initialize dataPtr … srcData.write( origin, size, dataPtr ); gpu_range_iterator it = dstData.gpu_range(origin, size); it.bind_for_read( iterCgParam ); srcData.bind_for_read( srcCgParam ); dstData.bind_for_write( COLOR0, myFrameBufferObject ); gpuForEach(it);

  41. Abstraction Other Benefits of Abstraction • Multiple PhysMem with same AddrTrans • “Unlimited” amount of data in structures • Multiple AddrTrans with one PhysMem • “reinterpret_cast” physical memory • Continuguous memory layout • Efficient stream processing of PhysMem or AddrTrans

  42. Overview • Motivation and previous work • Abstraction • Glift template library implementation • Case study • Adaptive shadow maps and octree 3D paint • Conclusions

  43. Implementation Glift Design Goals • Generic implementation of abstraction • As efficient as hand-coding • Unified C++ and Cg code base • Easily extensible • Incrementally adoptable • Easy integration with Cg/OpenGL

  44. Implementation Glift Components Application PhysMem VirtMem Container Adaptors AddrTrans C++/Cg/OpenGL

  45. 4D Array Declaration Example • Build 4D array of vec3f values typedef PhysMemGPU<vec2i, vec3f> PMem2D;typedef NdTo2DAddrTrans<vec4i,vec2i> Addr4to2;typedef VirtMemGPU<Addr4to2, PMem2D> VMem4D; vec4i virtSize( 10, 10, 10, 10);vec2i physSize( 100, 100 ); PMem2D pMem2D( physSize );Addr4to2 addrTrans( virtSize, physSizse );VMem4D array4D( addrTrans, pMem2D );

  46. Implementation PhysMem • Templated texture class • Defines all C++ and Cg GPU memory interfaces • GPU, CPU, and CPU-GPU • Template parameters • Address type dimension value type • Value type dimension value type • Example typedef PhysMemGPU<vec2f, vec1f> PMem2D; PMem2D pMem2D( vec2i(100, 100) );

  47. Implementation AddrTrans • Template parameters • Virtual address type • Physical address type • Boundary condition • …Specific parameters… • Example typedef NdTo2DAddrTrans<vec4i,vec2f> Addr4to2; Addr4to2 addrTrans( vec4i(10, 10, 10, 10), vec2i(100, 100) );

  48. Implementation AddrTrans • Core of data structure • Extension point for creating new structures • Must define translate(…)translate_range(…)cpu_range(…)gpu_range(…)

  49. Implementation VirtMem • Composition of an AddrTrans and PhysMem • Defines all C++ and Cg GPU memory interfaces • Parameters • Address translator type • Physical memory type • Example typedef VirtMemGPU<Addr4to2, PMem2D> VMem4D; VMem4D vMem4D( addrTrans, pMem2D );

  50. Implementation Container Adaptors • High-Level Containers • Apply behavior to underlying container • STL stack is adaptor atop deque, vector, … • Wrap up typedefs • Examples ArrayGpuND<vec1i, vec4ub> myArray( 20000 ); StackGPU<vec3f> myGpuStack( 1000 );

More Related