1 / 31

Background, Outline

Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008. Background, Outline. Stanford Graphics / Architecture Research CPU, GPU trends And collision? Two research areas: HW/SW Interface, Programming Model

bnebel
Download Presentation

Background, Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Many-Core Programming with GRAMPS& “Real Time REYES”Jeremy Sugerman, Kayvon FatahalianStanford UniversityJune 12, 2008

  2. Background, Outline • Stanford Graphics / Architecture Research • CPU, GPU trends • And collision? • Two research areas: • HW/SW Interface, Programming Model • Future Graphics API

  3. Problem Statement • Drive efficient development and execution in many-/multi-core systems. • Support homogeneous, heterogeneous cores. • Inform future hardware Status Quo: • GPU Pipeline (Good for GL, otherwise hard) • CPU (No guidance, fast is hard)

  4. = Thread Stage = Queue = Stage Output = Shader Stage = Fixed-func Stage GRAMPS Rasterization Pipeline Output Fragment Queue Input Fragment Queue • Software defined graphs • Producer-consumer, data-parallelism • Initial focus on rendering Frame Buffer FB Blend Shade Rasterize Ray Queue Ray Tracing Pipeline Camera Intersect Ray Hit Queue Fragment Queue Frame Buffer FB Blend Shade

  5. As a GPU Evolution • Not (too) radical for ‘graphics’ • Like fixed → programmable shading • Pipeline undergoing massive shake up • Diversity of new parameters and use cases • Bigger picture than ‘graphics’ • Rendering is more than GL/D3D • Compute is more than rendering • Larrabee has no innate pipeline

  6. As a Compute Evolution • Sounds like streaming: Execution graphs, kernels, data-parallelism • Streaming: “squeeze out every FLOP” • Goals: bulk transfer, arithmetic intensity • Intensive static analysis, custom chips (mostly) • Bounded space, data access, execution time • GRAMPS: “interesting apps are irregular” • Goals: Dynamic, data-dependent code • Aggregate work at run-time • Heterogeneous commodity platforms • Naturally supports streaming when applicable

  7. GRAMPS’ Role • A ‘graphics pipeline’ is now an app! • GRAMPS models parallel state machines. • Compared to status quo: • More flexible than a GPU pipeline • More guidance than bare metal • Portability in between • Not domain specific

  8. GRAMPS Interfaces • Host/Setup: Create execution graph • Thread: Stateful, singleton • Shader: Data-parallel, auto-instanced

  9. What We’ve Built (System)

  10. GRAMPS Scheduler • Tiered Scheduler • ‘Fat’ cores: per-thread, per-core • ‘Micro’ cores: shared hw scheduler • Top level: tier N

  11. = Thread Stage = Queue = Stage Output = Shader Stage = Push Output = Fixed-func What We’ve Built (Apps) Direct3D Pipeline (with Ray-tracing Extension) Input Vertex Queue 1 Primitive Queue 1 Primitive Queue Fragment Queue Sample Queue Set Frame Buffer Vertex Buffers IA 1 VS 1 Rast PS OM RO … … Ray Queue IA N VS N Ray Hit Queue Primitive Queue N Input Vertex Queue N Trace PS2 Ray-tracing Extension Ray-tracing Pipeline Tile Queue Sample Queue Ray Queue Tiler Sampler Intersect Camera Ray Hit Queue Fragment Queue Frame Buffer FB Blend Shade

  12. Initial Results • Queues are small, utilization is good

  13. GRAMPS Visualization

  14. GRAMPS Visualization

  15. GRAMPS Portability • Portability really means performance. • Less portable than GL/D3D • GRAMPS graph is hardware sensitive • More portable than bare metal • Enforces modularity • Best case, just works • Worst case, saves boilerplate

  16. High-level Challenges • Is GRAMPS a suitable GPU evolution? • Enable pipeline competitive with bare metal? • Enable innovation: advanced / alternative methods? • Is GRAMPS a good parallel compute model? • Map well to hardware, hardware trends? • Support important apps? • Concepts influence developers?

  17. What’s Next for GRAMPS? • Implementation: scheduling, simulation details • Model: Graph modification (state change) Blocking calls (join) Intra/inter-stage synchronization primitives Data sharing / ref-counting • Workloads: REYES, physics, others? • Develop new graphics pipelines…

  18. “Real-Time REYES”

  19. Just Build It Build a real-time REYES pipeline... … that is tightly integrated with ray tracing for global effects.

  20. What does real-time REYES mean? (to us) • Smooth surfaces via adaptive tessellation • Everything is a displaced subdivision surface • Shade on surface, prior to rasterization • Stochastic rasterization for motion blur and DOF • Order-independent transparency

  21. OpenGL/Direct3D REYES Split Tessellate (xbox) Dice Vertex Shade Displace Rasterize Early Z Early Z Shade Frag Shade Rasterize Z Test Z Test Blend/Resolve Blend/Resolve

  22. REYES Tessellation Split primitive into smaller primitives until a “GOOD” grid can be created.

  23. Grids Regular parametric sampling of primitive surface (like XBox360). Compact representation for many adjacent polygons. Grids provide SIMD efficiency and bulk processing benefits. GOOD GRID = - Max polygon area < 1 pixel - All polys about the same size - Bounded # polys per grid

  24. REYES OpenGL/Direct3D Split Tessellate (xbox) Dice Vertex Shade Displace Rast Early Z Early Z Shade Frag Shade Rast/Crack Fix Z Test Z Test Blend/Resolve Blend/Resolve

  25. What does real-time REYES mean? (to us) • Smooth surfaces via adaptive tessellation • Splitting is irregular (and serial) • Crack fixing • Shade on surface, prior to rasterization • We feel confident about this • But most “work” done before moving to raster space… hmm • Stochastic rasterization for motion blur and DOF • Many tiny polygons  parallel rasterization • SIMD tricky • Order-independent transparency • Not unique to REYES

  26. Shading in a Hybrid System • Evaluate displacement (due to REYES or on demand for ray tracing) • Shade grids • Shade ray hits • Looking forward… shade quads too? One shading system or two or three?

  27. This Project is Really About • Re-architecting REYES pipeline for real-time performance (for throughput architectures like LRB) • Hybrid rendering: study interoperability of advanced techniques (REYES + ray tracing + maybe Direct3D) • Hybrid shading system • Understand workload balance • Hybrid pipeline interface: real-time, retained mode • Pursuit of more flexible, advanced graphics pipelines

  28. Questions?

More Related