1 / 23

GRAMPS: A Programming Model For Graphics Pipelines

Explore the evolution of the graphics pipeline and the rise of programmable shaders with GRAMPS, a programming model that allows for parallel hardware execution and customization. Learn about the structure, scheduling, and optimization of this innovative approach.

Download Presentation

GRAMPS: A Programming Model For Graphics Pipelines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GRAMPS: A Programming Model For Graphics Pipelines Jeremy Sugerman, Kayvon Fatahalian, Solomon Boulos, Kurt Akeley, Pat Hanrahan

  2. “The Graphics Pipeline” Central to the rise of 3D hardware and software. A stable and universal abstraction Shaped the evolution of the field… … while leaving enormous room for innovation. Input Assembler Vertex Shader Rast Pixel Shader Output Merger

  3. The Graphics Pipeline is evolving Vertex Shader Rast Hull Shader Pixel Shader Pixel Shader Tessellator Output Merger Domain Shader Geometry Shader Stream Output Programmable Shading Fixed Function Direct3D 11 Direct3D 10 Input Assembler Vertex Shader

  4. “GPU” is evolving, too Continued drive for algorithmic innovation and advanced rendering techniques First class programming models for compute: OpenCL, compute shaders, vendor specific, … New / different hardware implementations: E.g., Larrabee, CPU-GPU combinations / hybrids Even NVIDIA and AMD GPUs are very different

  5. From fixed to programmable (again) Idea: Evolve the pipeline itself from preset configurations to a programmable entity

  6. Programming model and run-time for parallel hardware Graphs of stages and queues GRAMPS handles scheduling, parallelism, data-flow GRAMPS Example: Simple GRAMPS Graph Thread Shader Fixed Function

  7. The Graphics Pipeline becomes an app! Structure/setup is (application) software Customized or completely novel renderers Reuses current hardware: FIFOs, shader cores, rast, … Analogous to the transition to programmable shading Proliferation of new use cases and parameters Not (unthinkably) radical Frame Buffer Input Vertex Rast Pixel Merge

  8. Writing a GRAMPS application Design the execution graph: Design the stages: Shaders Threads (and Fixed Function stages) Instantiate and launch. Frame Buffer Input Vertex Rast Pixel Merge Merge

  9. More Detail – Queues Queues operate at a “packet” granularity “Large bundles of coherent work” GRAMPS can optionally enforce ordering Required for some workloads, adds overhead Frame Buffer Input Vertex Rast Pixel Merge

  10. More Detail – Shaders Shaders: Like pixel (or compute) shaders, stateless Automatic instancing, pre-reserve/post-commit “Collection” packets: shared header and N elements New: “Push” operation to coalesce variable outputs Frame Buffer Input Vertex Rast Pixel Merge Vertex

  11. More Detail – Thread/Fixed Function Threads: Like POSIX threads, stateful Explicit reserve/commit on queues Fixed Function: Effectively non-programmable Threads Frame Buffer Input Vertex Rast Pixel Merge Rast

  12. More Detail – Queue Sets Queue sets enable binning-style algorithms One logical queue with multiple lanes (or bins) One consumer at a time per lane Many lanes with data allows many parallel consumers Frame Buffer Frame Buffer Pixel Merge Input Vertex Rast Pixel Merge

  13. Quick Comparison to “Streaming” Streaming: “squeeze out every FLOP” Goals: throughput, bulk transfer, arithmetic intensity Intensive static analysis, program transformation Bound space, data access, execution time GRAMPS: “interesting applications are irregular” Goals: throughput, dynamic, data-dependent code Aggregate work at run-time, heterogeneous hardware Streaming apps are GRAMPS apps

  14. Evaluation: Design Goals Broad application scope: preferable to roll-your-own Multi-platform: suits many hardware configurations High performance: competitive with roll-your-own Tunable: expert users can optimize their apps Optimized Implementations: inform, and are informed by, hardware

  15. Broad Application Scope Rasterization Pipeline (with ray tracing extension) Frame Buffer Vertex Buffers IA1 VS1 RO Rast PS OM IAN VSN Trace PS2 Ray Tracing Extension Tiler Sampler Camera Intersect Frame Buffer Shade Shadow Intersect Blend = Thread = Queue = Shader = Stage Output = Fixed-Func = Push Output Ray Tracing Graph

  16. Multi-Platform: Two (Simulated) Machines CPU-Like: 8 Fat Cores, Rast GPU-Like: 1 Fat Core, 4 Micro Cores, Rast, Sched

  17. High Performance – Metrics Priority #1: Show scale out parallelism Can GRAMPS exploit the application parallelism and fill the machine? Priority #2: Show ‘reasonable’ bandwidth / storage requirements for queueing What is the worst case total footprint of all queues? A scheduling problem: trade-off with possible parallelism

  18. High Performance – Scheduling Very simple static prototype scheduler (both platforms): Static stage priorities: Limited pre-emption points No dynamic weighting of current queue depths 1 2 3 4 Frame Buffer 5 6 7 (Lowest) (Highest)

  19. High Performance – Results Three scenes x { Rasterization, Ray Tracer, Hybrid } Parallelism is 95+% for all but rasterized fairy (~80%). Queues are small: < 600KB CPU-like, < 1.5MB GPU-like Order costs footprint

  20. Also: raw counters, statistics, text log of run-time activity Tunability – Understanding Performance • GRAMPSviz:

  21. Tunability – Lessons Learned Execution Graph topology / design: Sizing critical queues: Frame Buffer Rast PS + OM Tiler Sampler Camera Intersect Frame Buffer Shade Shadow Intersect Blend Frame Buffer Sort-Middle Sort-Last Rast PS OM

  22. Summary After a long era of stability, the Graphics Pipeline is undergoing rapid change. GRAMPS enables software-defined custom pipelines. The Graphics Pipeline becomes an app Prototypes show plausible performance, resource needs Handles heterogeneous parallelism well Applicable beyond rendering and beyond GPUs

  23. Thank You Our funding agencies: Stanford Pervasive Parallelism Lab Department of the Army Research Intel Corporation Rambus Stanford Graduate Fellowship Intel PhD Fellowship NSF Graduate Research Fellowship http://graphics.stanford.edu/papers/gramps-tog/

More Related