1 / 45

Prerequisites

Getting The Best Out Of D3D12 Evan Hart, Principal Engineer, NVIDIA Dave Oldcorn, D3D12 Technical Lead, AMD. Prerequisites. An interest in D3D12 Ideally, already looked at D3D12 Experienced Graphics Programmer Console programming experience Beneficial, not required. Brief D3D12 Overview.

tcastonguay
Download Presentation

Prerequisites

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Getting The Best Out Of D3D12Evan Hart, Principal Engineer, NVIDIADave Oldcorn, D3D12 Technical Lead, AMD

  2. Prerequisites • An interest in D3D12 • Ideally, already looked at D3D12 • Experienced Graphics Programmer • Console programming experience • Beneficial, not required

  3. Brief D3D12 Overview

  4. The ‘What’ of D3D12 • Broad rethinking of the API • Much closer to HW realities • Model is more explicit • Less driver magic

  5. “With great power comes great responsibility.” • D3D12 answers many developer requests • Be ready to use it wisely and it can reward you

  6. Console Vs PC • D3D12 offers a great porting story • More of the explicit control console devs crave • Much less driver interference • Still a heterogeneous environment • Need to test carefully • Heed API and tool warnings (exposed corners) • Game will run on HW you never tested

  7. Central Objects to D3D12 • Command Lists • Bundles • Pipeline State Objects • Root Signature and Descriptor Tables • Resource Heaps

  8. Using Bundles And Lists Dispatch Draw Bundle Command List Frame

  9. Command Lists & Bundles • Bundle • Small object recording a few commands • Great for reuse, but a subset of commands • Like drawing 3 meshes in an object • Command List • Useful for recording/submitting commands • Used to execute bundles and other commands

  10. Pipeline State Object • Collates most render state • Shaders, raster, blend • All packaged and swapped together

  11. Pipeline State Object Pipeline State Pixel Shader Rasterizer State Vertex Shader Blend State Geometry Shader Depth State Hull Shader Topology Domain Shader RT Format Compute Shader Input Layout

  12. Root Signature & Descriptor Tables • New method for resource setting • Flexible interface • Methods for changing large blocks • Methods for small bits quickly • Indexing and open-ended tables enable “bindless”-like behaviour

  13. Resource Heaps • New memory management primitive • Tie multiple related resources into one heap • App controls residency on the heap • Somewhat coarse • Enables console-like memory aliasing

  14. New HW Features • Conservative Rasterization • Raster Ordered Views • Typed UAV • PS write of stencil reference • Volume tiled resources

  15. Advice for the D3D12 Dev

  16. Practical Developer Advice • Small nuggets on key issues • Advice is from experience • Multiple engines have done trial ports • Many months of experimentation • Driver, API, and app level

  17. Efficient Submission • Record commands in parallel • Reuse fragments via bundles • Taking over some driver/runtime work • Make sure your code is efficient (and parallel) • Submit in batches with ExecuteCmdLists • Submit throughout the frame

  18. Engine organisation • Consider task oriented engines • Divide rendering into tasks • Run CPU tasks to build command lists • Use dependencies to order GPU submission • Also helps with resource barriers

  19. Threading: Done Badly Aux Thread Aux Thread Aux Thread Game Thread Command List 0 Submit Create Resource Command List 1 Submit Present Render Thread App render code, runtime, driver all on one!

  20. Threading: Done Well Game Thread Create Resource Create Resource Compile PSO Async Thread Command List 1 Command List 2 Worker Thread Command List 0 Submit CL0 Submit CL1 Command List 3 Submit CL2 Submit CL3 Present Master Render Thread Many solutions, key is parallelism!

  21. PSO Practicalities • Merged state removes driver validation costs • Don’t needlessly thrash state • Just because it is a PSO, doesn’t mean every state needs to flip in HW • Avoid toggling compute/graphics • Avoid toggling tessellation • Use sensible defaults for don’t care fields

  22. Creating PSOs • PSO creation can be costly • Probably means a compile • Streaming threads should handle PSO • Gather state and create on async threads • Prevents stalls • Can handle specializations too

  23. Deferred PSO Update • “Quick first compile; better answer later” • Simple / generic / free initial shader • Start the compile of the better result • Substitute PSO when it’s ready • Generic / specialized especially useful • Precompile the generic case • More optimal path for special cases, compiled on low priority thread

  24. Using Bundles And Lists Dispatch Draw Bundle Command List Frame

  25. Bundle Advice • Aim for a moderate size (~12 draws) • Some potential overhead with setup • Limit resource binding inheritance when possible • Enables more complete cooking of bundle

  26. Lists Advice • Aim for a decent size • Typically hundreds of draw calls • Submit together when feasible • Don’t expect lots of list reuse • Per-frame changes + overlap limitation • Post-processing might be an exception • Still need 2-3 copies of that list

  27. Using Command Allocators

  28. Allocators and Lists List / Allocator memory usage • Invisible consumers of GPU memory • Hold on to memory until Destroy • Reuse on similar data • Warm list == no allocation during list creation • Destroy on different data • Reuse on disparate cases grows all lists to size of worst case over time Initial 100 draws Reset (Guaranteed no new allocations) Same 100 draws 5 draws Different 100 draws 200 draws

  29. Allocator Advice • Allocators are fastest when warm • Keep reusing allocator with lists of equal size • Need 2T + N allocators minimum • T -> threads creating command lists • N -> extra pool for bundles • All lists/bundles on an allocator freed together • Need to double/triple buffer for reusing the allocators

  30. Root Signature Per-Draw Table Pointer Tex Tex • Carefully layout root signature • Group tables by frequency of change • Most frequent changes early in signature • Standardize slots • Signature change costs Constant Buffer pointer (Modelview matrix, skinning) Per-draw constants Per-Material Table Pointer Const Buf (shader params) Const Buf (shader params) Tex Tex Per-Frame Table Pointer Const Buf (camera, eye...) Tex

  31. Root Signature Cnt’d • Place single items which change per-draw in the root arguments • Costs of setting new table vary across HW • Cost varies from nearly 0 to O(N) work where N is items in table • Avoid changes to individual items in tables • Requires app to instance table if in flight • Try to update whole table atomically

  32. Managing Resources with Heaps • Committed • Monolithic, D3D11-style • Placed • Offset in existing heap • Reserved • Mapped to heaps like tiled resources Resource [VA] Heap G-buffer Postprocess buffer Heap Heap

  33. Choosing a resource type:

  34. Resource tips • Committed gives driver more knowledge • Tiled resources have separate caps • Need to prepare for HW without it • Memory might be segmented • Cannot allocate entire space in a single heap

  35. Residency tips • MakeResident: • Batch these up • Expect CPU and GPU cost for page table updates • MakeUnresident • Cost of move may be deferred; may be seen on future MakeResident

  36. Working Set Management • Application has much more control in D3D12 • Directly tells the video memory manager which resources are required • App can be sharper on memory than before • On D3D11, working set per frame typically much smaller than registered resource • Less likely to end up with object in slow memory

  37. Working to a budget • “Budget” is the memory you can use • Get under the budget using residency • MakeUnresident makes object candidate to swap to system memory • It is much cheaper to unresident, then later resident again, than to destroy and create • Tiled resources can drop mip levels dynamically

  38. Barriers & Hazards • Most objects stay in one state from creation • Don’t insert redundant barriers • Always specify the right set of target units • Allows for minimal barrier • Group barriers into same Barrier call • Will take the worst case of all, rather than potentially incurring multiple sequential barriers

  39. Barriers enhance concurrency • Resources both read and written in a given draw created dependency between draws • Most common case was UAV used in adjacent dispatches Logical view of draws Draw 0 Draw 1 Draw 2 Draw 3 Barrier Draw 0 GPU timeline of draws Draw 1 Draw 2 Draw 3 Dispatches (D3D11) Dispatch 0 Dispatch 1 Dispatch 2

  40. Barrier enables overlap • Explicit barrier eliminates issue • App tells API when a true dependency exists, rather than it being assumed Logical view of dispatches Dispatch 0 Dispatch 1 Dispatch 2 Dispatch 0 Dispatches with explicit barrier control Dispatch 1 Dispatch 2

  41. CPU side • D3D12 simplifies picture • Easier to associate driver effort with application actions • Less likely that driver itself is the bottleneck • Be aware of your system buses

  42. GPU side • Environment is new • Less familiar without console experience • Interesting new hardware limits are now accessible • Use the tools

  43. Wrap up

  44. Get Ready • D3D12 done right isn’t just an API port • More so when referring to consoles • Good engine design offers a lot of opportunity • The power you’ve been asking for is here

  45. Questions

More Related