420 likes | 603 Views
Geometry Shaders and Stream out. COMP 5411: ADVANCED COMPUTER GRAPHICS FALL 2013. Unified Shader Architecture. Recent step in GPU evolution Complex hardware All shaders use same hardware HW scheduler determines what to execute: Vertices Triangles Pixels
E N D
Geometry Shaders and Stream out COMP 5411: ADVANCED COMPUTER GRAPHICSFALL 2013
Unified Shader Architecture • Recent step in GPU evolution • Complex hardware • All shaders use same hardware • HW scheduler determines what to execute: • Vertices • Triangles • Pixels • General overview and a few highlights • Motivations
Collaboration with Application Developers (ISVs) Hardware Developers (IHVs) Iterative process Start - spring 2003 Spec - fall 2004 HW implementations - 2006 Case study: Design Process (DX10) ISV1 ISV2 ISVn DirectX Team IHV1 IHV2 IHVm
Preserve data parallelism memory system efficiency coherence determinism Performance/$$ Improve state change agility implementation consistency program expressiveness resource limitations CPU offload Visual Complexity Constraints & Problems
Memory Vertex Buffer Input Assembler Index Buffer Vertex Shader Texture Geometry Shader Texture Buffer Stream Out Setup/ Rasterization Pixel Shader Texture Depth Output Merger Color Unified Shader Architecture(from DX10 – OGL3.2) • Logical pipeline • Programmer’s view
Input assembler Fixed-function Generate IDs Primitive, vertex, instance Memory Vertex Buffer Input Assembler Index Buffer Vertex Shader Texture Geometry Shader Texture Buffer Stream Out Setup/ Rasterization Pixel Shader Texture Depth Output Merger Color System Architecture
Vertex shader Programmable Vertex transformations 1 vertex in, 1 out Read from memory Memory Vertex Buffer Input Assembler Index Buffer Vertex Shader Texture Geometry Shader Texture Buffer Stream Out Setup/ Rasterization Pixel Shader Texture Depth Output Merger Color System Architecture
Geometry Shader New, programmable Per-primitive processing 1 prim in, k prims out Read from memory Memory Vertex Buffer Input Assembler Index Buffer Vertex Shader Texture Geometry Shader Texture Buffer Stream Out Setup/ Rasterization Pixel Shader Texture Depth Output Merger Color System Architecture
Stream Out New, fixed-function Divert primitive data to 1D buffers 1 in, 1 out Write to memory Memory Vertex Buffer Input Assembler Index Buffer Vertex Shader Texture Geometry Shader Texture Buffer Stream Out Setup/ Rasterization Pixel Shader Texture Depth Output Merger Color System Architecture
Setup/Rasterization Fixed-function Clipping, divide by w Convert primitives to fragments 1 prim in, m frags out Memory Vertex Buffer Input Assembler Index Buffer Vertex Shader Texture Geometry Shader Texture Buffer Stream Out Setup/ Rasterization Pixel Shader Texture Depth Output Merger Color System Architecture
Pixel Shader Programmable Shade fragments 1 frag in, 0 or 1 out Read from memory Memory Vertex Buffer Input Assembler Index Buffer Vertex Shader Texture Geometry Shader Texture Buffer Stream Out Setup/ Rasterization Pixel Shader Texture Depth Output Merger Color System Architecture
Output Merger Fixed function Depth/stencil tests Color buffer blending Read/modify/write to memory Memory Vertex Buffer Input Assembler Index Buffer Vertex Shader Texture Geometry Shader Texture Buffer Stream Out Setup/ Rasterization Pixel Shader Texture Depth Output Merger Color System Architecture
Entire primitive as input Adjacency Optional Outputs zero or more primitives 1024 scalars out max Geometry Shader
Amplify geometry Expand Point Sprites Extrude silhouettes Extrude prisms/tets [Hirche04] Geometry Shader
Geometry Shader • Render to one of multiple targets • E.g., render to cube map • Treat cube map as 6-element array • Emit primitive multiple times • Per-cube face transform + array index GS 0 1 2 3 4 5 Render Target Array
Allow parallel processing but preserve serial order Buffer GS outputs (on chip) Limit output to 1K 32-bit values Application can specify less May allow greater parallelism Determinism & Parallelism 1 2 n … GS GS GS … Expansion to 2 triangles
Stream Out • Data from VS/GS can optionally be streamed out to a buffer • 32 bits per component (int or float) • Either single buffer of up to 16 elements • Or up to 4 buffers that have single elements • Always sent to rasterizer if rasterizer is enabled
Stream Out • Generated geometry easily redrawn using DrawAuto() command with no CPU intervention DrawAuto()
Position Normal Color Position Normal Color Position Texture Color Texture Normal Position Texture Color Normal Texture Multi-Stream Output • Array-of-structures vs. structure-of-arrays . . . . . . . . . . . . • Input Assembler supports both types as vertex buffers • Both styles are useful • Access pattern vs. memory coherency
Particle System Example • No CPU intervention • Particle state in 1D buffer • Read buffer and rewrite 2nd buffer each pass • Use GS to add or destroy particles
GS extrudes prism at each face [Hirche04] PS ray casts against height field Shade or discard pixel depending on ray test Displacement Map Example
Stream outupdatedvertices Stretch of triangle to drives wrinkles Sparse Morph Targets
Shells: problems at silhouettes 8 shells + fins
Silhouette detection on the GS 2 3 1 N2 4 0 N1 5 if( dot(eyeVec,N1) > 0 && dot(eyeVec,N2) < 0)
Cloth as a Set of Particles • Each particle is subject to: • A force (gravity, wind, drag, etc.) • Various constraints: • To maintain overall shape (springs) • To prevent interpenetration with the environment (collision)
Cloth Simulation • Apply force to all particles • For as many times as necessary: • Apply spring constraints • Apply collision constraints • Render mesh
Constraints • The constraints create a system of equations to be solved at each time step • Use explicit integration:constraints are resolved by relaxation, that is by enforcing them one after the other for a given number of iterations
Spring Constraints • Particles are linked by springs: • A spring is simulated as a distance constraint between two particles Structural springs Shear springs
Distance Constraint • A distance constraint DC(P, Q) between two particles P and Q is enforced by moving them away or towards each other. Q P P Q P Q Distance at rest
Parallel update Batch 1
Parallel update Batch 2
Parallel update Batch 3
Parallel update Batch 4
Cloth Simulation • Apply force to all particles synchronize • For as many times as necessary: • For all 4 batches • Apply spring constraints Synchronize • Apply collision constraints Synchronize • Render mesh
Stream-Out (DX10) Implementation • Particles stored in a vertex buffer • (DX9: particles would be stored in a texture) • Computation in Geometry Shader and Vertex Shader • (DX9: computation in pixel shader) • Synchronization (between passes) through Stream Out • (DX9: synchronization with writes to frame buffer and read from texture)
Some more Geometry Shader applications • Silhouette detection and extrusion for: • Shadow volume generation • NPR • Access to topology for calculating curvature • Render to cube map in single pass • In conjunction with Render Target arrays • GPGPU • enables variable number of outputs from shader
Main additions • Geometry shader • Stream out