270 likes | 429 Views
DX11 TECHNIQUES IN HK2207. Takahiro Harada AMD. HK2207. Demo for Radeon HD 6970 Based in Hong Kong 2207. Not just a single technique Cinematic with practical effects Physics effects Bullet CPU-physics CS rigid body Procedural adaptive tessellation Lighting effects Deferred rendering
E N D
DX11 TECHNIQUES IN HK2207 Takahiro Harada AMD
HK2207 • Demo for Radeon HD 6970 • Based in Hong Kong 2207 • Not just a single technique • Cinematic with practical effects • Physics effects • Bullet CPU-physics • CS rigid body • Procedural adaptive tessellation • Lighting effects • Deferred rendering • Post effects AMD‘s Favorite Effects
Live Connection AMD‘s Favorite Effects
CS Rigid Body Simulation AMD‘s Favorite Effects
CS Rigid Body • For visual effect • Simulation using CS • CS5.0 has full functionality to realize simulation • Key Features of CS • Group shared memory • Tree traversal • Narrowphase(NP) • Atomics • Collision • Random write AMD‘s Favorite Effects
Particle Representation • Approximate shapes with particles • Arbitrary convex mesh input • Scan conversion • Integration • A thread, rigid body • Collision • A thread, particle • Collision with mesh • Conversion to particles • Collide against triangles AMD‘s Favorite Effects GPU Gems3, Real-time Rigid Body Simulation on GPUs
Mesh Collision (BVH) • BVH used for broad phase collision detection • Contains static scene triangles • Node : 4 children, 4 volumes • Pack a few triangles in a leaf • Traversal efficiency • Separate data to another buffer TriData AMD‘s Favorite Effects
Mesh Collision (BVH) • Tree traversal • Traversal stack located in Thread Group Shared Memory(TGSM) • Traversal and Narrow phase(NP) are separated to keep high efficiency on the GPU • Less divergence • Reduce local resource usage AMD‘s Favorite Effects
Narrow Phase • Output from tree collision • HitData, List of triangle indices per body • Sparse • 1 body x 1 leaf collision == n particles x m tris • Cache relevant triangles in TGSM • Reduce memory traffic • Use 1 thread group(TG) for a body 0 1 2 3 4 5 6 7 8 9 10 Body0 Body1 Body2 Body3 Body4 Body5 HitData AMD‘s Favorite Effects
Narrow Phase: 1 Thread Group Void NP() { Bring64ParticlesIntoGPRs(); if( LOCAL_IDX == 0 ) LoadAllCollisionInfo(); BARRIER; forAllLeaves(;;) { forAllTriangles(;;j+=TG_SIZE) { fillTriangle( ldsVtx, ldsAabb , LOCAL_IDX ); BARRIER; for(k<TG_SIZE;k++) { if( ovelaps(ldsAabb[k]) ) collide( pData, ldsVtx[k] ); } } } } • 1 thread : 1 particle • Use 1 thread as a controller of the SIMD • Read HitData -> LeafData • Share LeafData (TGSM) • All the threads are used to read 64 tris in parallel • 64 collisions in parallel • AABB overlap test • 1 Triangle vs 64 particles collision AMD‘s Favorite Effects
Inefficiencies • Hit data buffer is sparse • We launch too many TGs • TG with 0 hit returns after mem access • Controller sections • Only controller is working • 63 threads are idle • Redundant overlap test(Particle-Tri) • Body-Tri test is enough • Leaf is not completely filled • Several leaves are colliding • Can issue more memory requests AMD‘s Favorite Effects
Introduce Prepass • Hit data buffer is sparse • We launch too many TGs • TG with 0 hit returns after mem access • Controller sections • Only controller is working • 63 threads are idle • Redundant overlap test(Particle-Tri) • Body-Tri test is enough • Leaf is not completely filled • Several leaves are colliding • Can issue more memory requests • Use Append Buffer • A body/thread • Use 64 threads to read • Less single thread work • Do Body-Tri test • Pack triangle Data • LeafA(4), LeafB(4) -> 8 Reduce local resource usage Better HW occupancy AMD‘s Favorite Effects
Pre Narrow Phase • Use 1 thread for a body • Read HitData -> LeafData -> Triangle • Body-Triangle AABB test • 64 Particle-Triangle collisions • Store colliding triangle indices • If any collide • Write to append buffer • Write triangle index to contiguous mem • Sorting by n hits improves divergence • Local sort Append Append Append Append Append Append AMD‘s Favorite Effects
Improved Narrow Phase Void NP() { Bring64ParticlesIntoGPRs(); if( LOCAL_IDX == 0 ) LoadAllCollisionInfo(); BARRIER; forAllLeaves(;;) { forAllTriangles(;;j+=TG_SIZE) { fillTriangle( ldsVtx, ldsAabb , LOCAL_IDX ); BARRIER; for(k<TG_SIZE;k++) { if( ovelaps(ldsAabb[k]) ) collide( pData, ldsVtx[k] ); } } } } Void NP() { Bring64ParticlesIntoGPRs(); if( LOCAL_IDX == 0 ) LoadNumHits(); BARRIER; for(i<ldsHitTriData.m_n;i+WG_SIZE) { fillTriangle( ldsVtx[LOCAL_IDX] , i+LOCAL_IDX ); BARRIER; for(j<WG_SIZE;j++) { collide( pData, ldsVtx[j] ); } } } AMD‘s Favorite Effects
Result AMD‘s Favorite Effects
MAKING IT LOOK PRETTY … AMD‘s Favorite Effects
Procedural Adaptive Tessellation • Add surface detail using DX11 tessellation • Hull shader • Calc tessellation factor using depth • Tessellator • Domain shader • Interpolate vertex position, normal • Displacement factor using 3D Perlin noise • Evaluate in local space • Displacement vector • Displace • Pixel shader • Normal is gradient AMD‘s Favorite Effects
Cracks • Different tessellation factor on edge • Objects are small enough • Sample depth at the center • Discontinuous displacement vector • Normal is not continuous • Use convexity of geometry • Interpolate normal and vector from center AMD‘s Favorite Effects
Other Techniques Used • Deferred shading • Depth of field • Emissive materials • Lens ghosting and flare • Aerial perspective • Reflections • Tone mapping • LUT color correction AMD‘s Favorite Effects
Color AMD‘s Favorite Effects
Light AMD‘s Favorite Effects
Emissive etc AMD‘s Favorite Effects
DOF AMD‘s Favorite Effects
End • Questions? • Acknowledgement • Jay McKee, Jason Yang, Justin Hensley, Lee Howes, Ali Saif, David Hoff, Abe Wiley, Dan Roeger AMD‘s Favorite Effects