1 / 22

Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008

Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008. Background, Outline. Stanford Graphics / Architecture Research Collaborators: Kayvon Fatahalian, Solomon Boulos, Kurt Akeley, Pat Hanrahan To appear in ACM Transactions on Graphics

naasir
Download Presentation

Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Many-Core Programming with GRAMPSJeremy SugermanStanford UniversitySeptember 12, 2008

  2. Background, Outline • Stanford Graphics / Architecture Research • Collaborators: Kayvon Fatahalian, Solomon Boulos, Kurt Akeley, Pat Hanrahan • To appear in ACM Transactions on Graphics • CPU, GPU trends… and collision? • Two research areas: • HW/SW Interface, Programming Model • Future Graphics API

  3. Problem Statement • Drive efficient development and execution in many-/multi-core systems. • Support homogeneous, heterogeneous cores. • Inform future hardware Status Quo: • GPU Pipeline (Good for GL, otherwise hard) • CPU (No guidance, fast is hard)

  4. = Thread Stage = Queue = Stage Output = Shader Stage = Fixed-func Stage GRAMPS Rasterization Pipeline Output Fragment Queue Input Fragment Queue • Software defined graphs • Producer-consumer, data-parallelism • Initial focus on rendering Frame Buffer FB Blend Shade Rasterize Ray Queue Ray Tracing Graph Camera Intersect Ray Hit Queue Fragment Queue Frame Buffer FB Blend Shade

  5. As a Graphics Evolution • Not (too) radical for ‘graphics’ • Like fixed → programmable shading • Pipeline undergoing massive shake up • Diversity of new parameters and use cases • Bigger picture than ‘graphics’ • Rendering is more than GL/D3D • Compute is more than rendering • Some ‘GPUs’ are losing their innate pipeline

  6. As a Compute Evolution (1) • Sounds like streaming: Execution graphs, kernels, data-parallelism • Streaming: “squeeze out every FLOP” • Goals: bulk transfer, arithmetic intensity • Intensive static analysis, custom chips (mostly) • Bounded space, data access, execution time

  7. As a Compute Evolution (2) GRAMPS: “interesting apps are irregular” Goals: Dynamic, data-dependent code Aggregate work at run-time Heterogeneous commodity platforms Naturally allows streaming when applicable 7

  8. GRAMPS’ Role • A ‘graphics pipeline’ is now an app! • GRAMPS models parallel state machines. • Compared to status quo: • More flexible than a GPU pipeline • More guidance than bare metal • Portability in between • Not domain specific

  9. GRAMPS Interfaces • Host/Setup: Create execution graph • Thread: Stateful, singleton • Shader: Data-parallel, auto-instanced

  10. GRAMPS Entities (1) Accessed via windows Queues: Connect stages, Dynamically sized Ordered or unordered Fixed max capacity or spill to memory Buffers: Random access, Pre-allocated RO, RW Private, RW Shared (Not Supported)

  11. GRAMPS Entities (2) Queue Sets: Independent sub-queues Instanced parallelism plus mutual exclusion Hard to fake with just multiple queues

  12. What We’ve Built (System)

  13. GRAMPS Scheduler • Tiered Scheduler • ‘Fat’ cores: per-thread, per-core • ‘Micro’ cores: shared hw scheduler • Top level: tier N

  14. = Thread Stage = Queue = Stage Output = Shader Stage = Push Output = Fixed-func What We’ve Built (Apps) Direct3D Pipeline (with Ray-tracing Extension) Input Vertex Queue 1 Primitive Queue 1 Primitive Queue Fragment Queue Sample Queue Set Frame Buffer Vertex Buffers IA 1 VS 1 Rast PS OM RO … … Ray Queue IA N VS N Ray Hit Queue Primitive Queue N Input Vertex Queue N Trace PS2 Ray-tracing Extension Ray-tracing Graph Tile Queue Sample Queue Ray Queue Tiler Sampler Intersect Camera Ray Hit Queue Fragment Queue Frame Buffer FB Blend Shade

  15. Initial Results • Queues are small, utilization is good

  16. GRAMPS Visualization

  17. GRAMPS Visualization

  18. GRAMPS Portability • Portability really means performance. • Less portable than GL/D3D • GRAMPS graph is (more) hardware sensitive • More portable than bare metal • Enforces modularity • Best case, just works • Worst case, saves boiler plate

  19. High-level Challenges • Is GRAMPS a suitable GPU evolution? • Enable pipeline competitive with bare metal? • Enable innovation: advanced / alternative methods? • Is GRAMPS a good parallel compute model? • Map well to hardware, hardware trends? • Support important apps? • Concepts influence developers?

  20. What’s Next: Implementation • Better scheduling • Less bursty, better slot filling • Dynamic priorities • Handle graphs with loops better • More detailed costs • Bill for scheduling decisions • Bill for (internal) synchronization • More statistics

  21. What’s Next: Programming Model Yes: Graph modification (state change) Probably: Data sharing / ref-counting Maybe: Blocking inter-stage calls (join) Maybe: Intra/inter-stage synchronization primitives 21

  22. What’s Next: Possible Workloads REYES, hybrid graphics pipelines Image / video processing Game Physics Collision detection or particles Physics and scientific simulation AI, finance, sort, search or database query, … Heavy dynamic data manipulation k-D tree / octree / BVH build lazy/adaptive/procedural tree or geometry 22

More Related