980 likes | 1.26k Views
Using Direct3D 10. Peter-Pike Sloan Microsoft Corporation. Outline. Overview of new features in the pipeline Discuss API changes, how things have evolved Go over several examples in the SDK that exploit the API Not a research talk!. Direct3D 10 Goals. Consistency
E N D
Using Direct3D 10 Peter-Pike Sloan Microsoft Corporation
Outline • Overview of new features in the pipeline • Discuss API changes, how things have evolved • Go over several examples in the SDK that exploit the API • Not a research talk!
Direct3D 10 Goals • Consistency • Common feature set (no/minimal “caps”) • Strict behavior (no undefined behavior) • Performance • API/driver designed to have less overhead • Do more per draw call • Keep data on the GPU • Generality • Integer instruction set • General resource views
Direct3D 10 • All State Commands: • IASetVertexBuffers/SetIndexBuffer • IASetPrimitiveTopology • {VS|GS|PS}SetShader • {VS|GS|PS}SetShaderResources • {VS|GS|PS}SetConstantBuffers • {VS|GS|PS}SetSamplers • SOSetTargets • RSSetState • RSSetViewports/ScissorRects • OMSetRenderTargets • OMSetBlendState • OMSetDepthStencilState
The Shader Core • New Unified Shader Core • All shader stages use the same cores • Have the same functionality • Comparison-Sample instruction • Percentage-Closer shadow Filtering • Immediate offset (up to +/-8) on Texture/Buffer load • Custom filter kernels • Resource Info • Returns height, width, # of miplevels, arraysize for the resource view • More of everything • Inter-stage registers, samplers, textures • Unlimited instruction count
The New PipelineDirect3D10 – Geometry Shader • Access to the whole primitive • Triangle • Line • Point • With adjacency
Programmable Setup Generate barycentric coordinates, interpolate arbitrary amount of data downstream Quadratic interpolation over triangles Data stored/computed at edge midpoints Basis functions simple polynomials of barycentric coordinates Analytic gradients Geometry Shader (0,1) (1,0) (0,0)
Geometry ShaderAmplification and De-Amplification • Emits primitives of a specified output type (point, linestrip, trianglestrip) • Limited geometry amplification/de-amplification: Output 0-1024 values per invocation • No more 1-in / 1-out limit! • Shadow Volumes • Fur/Fins • Procedural Geometry/Detailing • All-GPU Particle Systems • Point Sprites Geometry Shader
Stream Out • Amplification from GS/VS can be directed into a buffer • Generated geometry easily redrawn using DrawAuto() command with no CPU intervention DrawAuto()
Resources & Buffers • Textures aren’t just textures anymore • Normal maps, lookup tables, render targets, etc. • Resources allow for generalized usage of data • Same for buffers
New Resource Types: Texture Arrays • Dynamically indexable in the Shader • Whole array can be set as a render target • GS Can Emit a System-Interpreted Value: RenderTargetArrayIndex for the primitive Geometry Shader
Resource Views • Resource in D3D10 are generally typeless • Resource must be interpreted as a specific type by obtaining a view of the resource • Allows you to reinterpret data in a different format • Forces type validation earlier in setup • Don’t have to re-validate on every draw
Resource ViewsResource Views Example: Cubemap • Views enable interpretation of resources at different bind locations
Primitive Topologies • New Primitive Topologies • Include Adjacency Triangle Strip w/ adjacency Line List w/ adjacency Triangle List w/ adjacency Line Strip w/ adjacency
HLSL 10 • Direct3D10 Shader Authoring in HLSL • Minimizes invalid state introduced by ASM • Minimizes redundant intermediate representations • HLSL optimizations guaranteed to the driver • Enables fast shader linkage validation with signatures • Maximum asset portability • HLSL 10 shader disassembly will be available for debugging • Author-time compilation still supported and recommended
Constant Buffers A B C D A B B A D C Shader A Shader B Constant Buffers • Constants now managed like vertex/texture data • Updated efficiently via lock/discard or UpdateResourceUP • Set like any other resource • Up to 4096 4-channel × 32-bit elements per CB • Create as many CBs as you want; 16 can be bound to a shader at once
Constant Buffers • Example HLSL Syntax • Variables still exist in the global namespace • arrayIndex = 4; • myObject.arrayIndex = 4; cbuffer myObject { float4x4 matWorld; float3 vObjectPosition; int arrayIndex; } cbuffer myScene { float3 vSunPosition; float4x4 matView; }
State Objects • Reduce state-change overhead by grouping state into immutable objects • Input Layout • Format, Offset, InstanceDataStepRate, … • Rasterizer • Cull Mode, Multisample Enable, Fill Mode, … • DepthStencil • Depth Enable, Depth Func, Stencil Masks, … • Blend • SrcBlend, DestBlend, BlendOp, … • Sampler (No longer bound to a specific texture) • Filter Mode, MinLOD, MaxLOD,…
Fast Interstage Linkage • D3D10 API Design Imperative: No draw-time fix-up required • Fixed, generic register bank connects stages • Linkage enforced via “Signatures”
Signatures • Correspond to shader parameter declarations • Aggressively packed by HLSL • Like a Struct: Order matters • Put “optional” parameters at the end • Specialization/packing can save you precious interpolators ü
Multi-Sample Anti-Aliasing • Shading visibility at different rates • 1×, 4×, 8×, 16× depending on hardware • MSAA surface available in the shader as a resource • Samples indexed by sample number • Allows for deferred or multipass MSAA rendering
Queries & Predicates • Many events and stats gathered by runtime • Command completion • Object Occlusion (in samples rendered) • Pipeline Stats • Commands can be queued depending on the result of the query • Called a Predicate
Example: Predicated Rendering • Depending on occlusion query of a bounding box(OCCLUSIONPREDICATE), queue the rendering of a more complex object • No CPU involvement required • Use PREDICATEHINT hint to avoid accidental pipeline stall for query result
More Examples: Single-Pass Render-to-Cubemap Geometry Shader ID3D10ShaderResourceView::GenMips(…)
GPU Material System • Using the GS, delegate material properties to the PS per-primitive • Put all similar-sized textures into an array • Use a switch to group material code • Push material management onto the GPU
Material System Flexibility • This may require complex shader execution • Eats up registers • More GPU thread state • Shader Specialization is still very powerful • D3D10 allows YOU to choose what is processed on the CPU or on the GPU • Author on Direct3D 10 Specialize for performance and cross-platform
Small Batch Performance • Fewer calls needed • Geometry Shaders/ Constant Buffers/ Texture Arrays… • Remaining calls are fast • Massive reduction in state and validation overhead: • Validation on CREATION, not on binding • Views, State Objects • Avoid CPU intervention • Predicated Draw() • DrawAuto() • Runtime, refactored for performance
Strict Specification • Strictly-defined, consistent behavior throughout the pipeline • IEEE floating-point compliance (almost) • Includes IEEE754R NaN-quashing Min/Max instructions • Precise FP32 sampling/blending/math/conversion rules. Ex: • FP32 shader ops – precise to 1.0 ULP • FP32 to Integer – precise to 0.6 ULP per op • FP16 blending - precise to 0.6 ULP per op • FP32 blending required • Exact line/triangle/AA rasterization rules
Some SDK Examples • Procedural Geometry (Vines) • PipesGS in the DirectX SDK • Volume Particles • SoftParticles in the DirectX SDK • Sparse Morph Targets • SparseMorphTargets in the DirectX SDK
Procedural GeometryGoals • Create dynamically growing vegetation on the GPU. • Have vegetation grow as if it knows about the underlying geometry. • Do the entire thing in 2 Draw calls.
Procedural Geometry Approach • Use dynamically generated line lists to describe vines. • Lines contain 3 types of vertices: • Start – spawn point for a vine • Grow – defines the actively growing end of the vine • Static – points that have finished growing • Updating of all vines takes one Draw call.
Procedural Geometry Growing Vines • Create a seed buffer of Start vertices • Use the geometry shader to read in a list of vertices, operate on those vertices, and stream out the results • Each vertex type has a timer: • The timer determines when the vertex moves • The timer also determines when the vertex changes type
Procedural Geometry Drawing Vines • Bind the streamed out list of vine points as a vertex buffer • In the geometry shader, treat this list as a line list • For each pair of vertices, create a triangle-striped cylindrical section whose size is related to the age of the vertex
Procedural Geometry Add Some Leaves • Add leaves pseudo-randomly, based upon vertex data • Emit an extra quad from the GS when creating the cylindrical section • Change the texture coordinates to lookup a leaf texture in a texture array • This draws all vines and leaves in the scene with one Draw call!