1 / 19

Brook for GPUs

Brook is a general-purpose streaming language for GPUs that enforces data parallel computing through streams and kernels, aiming to make GPU programming easier and more performance-efficient. It hides low-level graphics details, virtualizes resources, and optimizes arithmetic intensity. The language streamlines GPU coprocessor usage and supports powerful features like reductions and scatter operations. Learn more about its compiler, runtime library, and programming model.

pwanda
Download Presentation

Brook for GPUs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Brook for GPUs Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman Pat Hanrahan February 10th, 2003

  2. Stream Execution Unit Scalar Execution Unit Stream Register File text Memory text System Network DRDRAM Network Interface Brook: general purpose streaming language • developed for PCA Program/Merrimac • compiler: RStream • Reservoir Labs • DARPA PCA Program • Stanford: SmartMemories • UT Austin: TRIPS • MIT: RAW • Brook version 0.2 spec: http://merrimac.stanford.edu • Brook for GPUs: http://brook.sourceforce.net

  3. Brook: general purpose streaming language • stream programming model • enforce data parallel computing • streams • encourage arithmetic intensity • kernels • C with streams

  4. Brook for gpus • demonstrate gpu streaming coprocessor • make programming gpus easier • hide texture/pbuffer data management • hide graphics based constructs in CG/HLSL • hide rendering passes • virtualize resources • performance! • … on applications that matter • highlight gpu areas for improvement • features required general purpose stream computing

  5. system outline .br Brook source files brcc source to source compiler brt Brook run-time library

  6. Brook languagestreams • streams • collection of records requiring similar computation • particle positions, voxels, FEM cell, … float3 positions<200>; float3 velocityfield<100,100,100>; • encourage data parallelism

  7. Brook languagekernels • kernels • functions applied to streams • similar to for_all construct kernel void foo (float a<>, float b<>, out float result<>) { result = a + b; } float a<100>; float b<100>; float c<100>; foo(a,b,c); for (i=0; i<100; i++) c[i] = a[i]+b[i]; • no dependencies between stream elements • encourage high arithmetic intensity

  8. Brook languagekernels • Ray Triangle Intersection kernel void krnIntersectTriangle(Ray ray<>, Triangle tris[], RayState oldraystate<>, GridTrilist trilist[], out Hit candidatehit<>) { float idx, det, inv_det; float3 edge1, edge2, pvec, tvec, qvec; if(oldraystate.state.y > 0) { idx = trilist[oldraystate.state.w].trinum; edge1 = tris[idx].v1 - tris[idx].v0; edge2 = tris[idx].v2 - tris[idx].v0; pvec = cross(ray.d, edge2); det = dot(edge1, pvec); inv_det = 1.0f/det; tvec = ray.o - tris[idx].v0; candidatehit.data.y = dot( tvec, pvec ) * inv_det; qvec = cross( tvec, edge1 ); candidatehit.data.z = dot( ray.d, qvec ) * inv_det; candidatehit.data.x = dot( edge2, qvec ) * inv_det; candidatehit.data.w = idx; } else { candidatehit.data = float4(0,0,0,-1); } }

  9. Brook languageadditional features • reductions • scalar • stream • stride & repeat • GatherOp & ScatterOp • a[i] += p • p = a[i]++

  10. brcc compilerinfrastructure • based on ctool • http://ctool.sourceforge.net • parser • build code tree • extend C grammar to accept Brook • convert • tree transformations • codegen • generate cg & hlsl code • call cgc, fxc • generate stub function

  11. Applications Ray-tracer FFT Segmentation Linear Algebra: • BLAS, LINPACK, LAPACK

  12. Brook Performance

  13. GPU Gotchas Time Registers Used

  14. GPU Gotchas NVIDIA NV3x: Register usage vs. Time Time Registers Used

  15. GPU Gotchas NVIDIA: • Register Penalty • Render to Texture Limitation • Requires explicit copy or heavy pbuffer solution • Superbuffer extension needed http://mirror.ati.com/developer/SIGGRAPH03/Percy_OpenGL_Extensions SIG03.pdf

  16. GPU Gotchas ATI Radeon 9800 Pro • Limited dependent texture lookup • 96 instructions • 24-bit floating point • s16e7 Integers up to 131,072 (s23e8: 16,777,216) Memory Refs 1 Math Ops Memory Refs 2 Math Ops Memory Refs 3 Math Ops Memory Refs 4 Math Ops

  17. GPU Catch-Up! • Integer & Bit Ops & Double Precision • Memory Addressing • CGC/FXC Performance • Hand code performance critical code • No native reduction support • No native scatter support • p[i] = a (indirect write) • No programmable blend • GatherOp / ScatterOp • Limited 4x4 output • Brook virtualized kernel outputs • Readback still slow • NV35 OpenGL: 600 MB/sec Download 170 MB/sec Readback • ATI DirectX: 550 MB/sec Download 50 MB/sec Readback

  18. SDRAM ALU Cluster ALU Cluster SDRAM Stream Register File SDRAM SDRAM ALU Cluster GPUs of the future (we hope) • Complete Instruction Sets • Integers, Bit Ops, Doubles, Mem Access • Integration • Streaming coprocessor not just a rendering device • Streaming architectures

  19. Brook for GPUs • Release v0.3 available on Sourceforge • Project Page • http://graphics.stanford.edu/projects/brook • Source • http://www.sourceforge.net/projects/brook • Over 4K downloads! • Questions? Fly-fishing fly images from The English Fly Fishing Shop

More Related