1 / 25

Technology Behind AMD’s “Leo Demo” Jay McKee MTS Engineer, AMD

Technology Behind AMD’s “Leo Demo” Jay McKee MTS Engineer, AMD. Why Forward Rendering ?. Complex materials Multiple light types Supports hardware anti-aliasing Efficient memory usage Supports transparency BUT , previously could not support a large number of lights.

maili
Download Presentation

Technology Behind AMD’s “Leo Demo” Jay McKee MTS Engineer, AMD

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Technology Behind AMD’s “Leo Demo”Jay McKeeMTS Engineer, AMD

  2. Why Forward Rendering? • Complex materials • Multiple light types • Supports hardware anti-aliasing • Efficient memory usage • Supports transparency • BUT, previously could not support a large number of lights

  3. Forward+ Rendering • Modified forward renderer. Add computer shader for light culling. Modify main light loop. • Lighting and shading done in the same place, all information is preserved.

  4. Forward+ Rendering (continued) • No limits on parameters for lights and materials • Omni • Spot • Cinematic (arbitrary falloffs, barndoor) • BRDF per material instance • Simple design, concentrate on rendering, not engine maintenance.

  5. Important DX11 features • Compute Shaders • UAV support.

  6. Compute Shaders • In Leo demo we use two compute shaders: • One for culling lights. • Another for spawning Virtual Point Lights (VPLs) for indirect lighting. • Culling 3,072 lights takes 1.7 ms on high end GPU.

  7. UAVs • Array(s) of scene light information. • Array of u32 light indices for storing start/end lights per-tile. • Array of material instance data

  8. Algorithm summary • Depth Pre-Pass • Light Culling • Screen divided into tiles. Launch compute shader per tile. • Light info such as position, radius, direction, length passed to light culling compute shader. • Light culling shader projects lights bounds to screen-space tiles. Uses scene depth from z pre-pass for z testing against light volumes. • Outputs to UAV describing per tile light list start/end along with a large UAV of u32 array of light indices. • Output UAVs are passed to main light shaders for looping through lights per-pixel.

  9. Algorithm summary continued • Render scene materials • Base light accumulation function • Use screen x, y location to determine tileID • From tileID, get light start and end indices • From start index to end index, loop • Entry is index into light array. • Accumulate light hitting pixel • Returns total direct and indirect light hitting pixel.

  10. Algorithm summary continued • Material shader • Decides what to do with total incoming light • Passed into material’s BRDF for example • Uses light accumulation building blocks • Env. lighting, base light accumulation, BRDF, etc. are put together for final pixel color.

  11. Light Culling Shader Details (1/3) // 1. prepare float4 frustum[4]; float minZ, maxZ; { ConstructFrustum( frustum ); minZ = thread_REDUCE(MIN, depth ); maxZ = thread_REDUCE(MAX, depth ); ldsMinZ = SIMD_REDUCE(MIN, minZ ); ldsMaxZ = SIMD_REDUCE(MAX, maxZ ); minZ = ldsMinZ; maxZ = ldsMaxZ; }

  12. Light Culling Shader Details (2/3) __local u32 ldsNLights = 0; __local u32 ldsLightBuffer[MAX]; // 2. overlap check, accumulate in LDS for(int i=threadIdx; i<nLights; i+=WG_SIZE) { Light light = fetchAndTransform( lightBuffer[ i ] ); if( overlaps( light, frustum ) && overlaps ( light, minZ, maxZ ) ) { AtomicAppend( ldsLightBuffer, i ); } }

  13. Light Culling Shader Details (3/3) // 3. export to global __local u32 ldsOffset; if( threadIdx == 0 ) { ldsOffset = AtomAdd( ldsNLights ); globalLightStart[tileIdx] = ldsOffset; globalLightEnd[tileIdx] = ldsOffset + ldsNLights; } for(int i=threadIdx; i< ldsNLights; i+=WG_SIZE) { intdstIdx = ldsOffset + i; globalLightIndexBuffer[dstIdx] = ldsLightBuffer[i]; }

  14. Light Accumulation Pseudo-code // BaseLighting.inc // THIS INC FILE IS ALL THE COMMON LIGHTING CODE StructuredBuffer<float4> LightParams : register(u0); StructuredBuffer<uint> LowerBoundLights : register(u1); StructuredBuffer<uint> UpperBoundLights : register(u2); StructuredBuffer<int2> LightIndexBuffer : register(u3); uintGetTileIndex(float2 screenPos) { float tileRes = (float)m_tileRes; uintnumCellsX = (m_width + m_tileRes - 1)/m_tileRes; uinttileIdx = floor(screenPos.x/tileRes)+floor(screenPos.y/tileRes)*numCellsX; return tileIdx; } }

  15. Light Accumulation (2): StartHLSLBaseLightLoopBegin // THIS IS A MACRO, INCLUDED IN MATERIAL SHADERS uinttileIdx = GetTileIndex( pixelScreenPos ); uintstartIdx = LowerBoundLights[tileIdx]; uintendIdx = UppweBoundLights[tileIdx]; [loop] for ( uintlightListIdx = startIdx; lightListIdx < endIdx; lightListIdx++ ) { intlightIdx = LightIndexBuffer[lightListIdx]; // Set common light parameters float ndotl = max(0, dot(normal, lightVec)); float3 directLight = 0; float3 indirectLight = 0;

  16. Light Accumulation (3): if( lightIdx >= numDirectLightsThisFrame ) { CalculateIndirectLight(lightIdx, indirectLight); } else { if( IsConeLight( lightIdx ) ) { // <<== Can add more light types here CalculateDirectSpotlight(lightIdx, directLight); } else { CalculateDirectSpherelight(lightIdx, directLight); } } float3 incomingLight = (directLight + indirectLight)*ndotl; float shadowTerm = CalcShadow(); EndHLSL StartHLSLBaseLightLoopEnd } EndHLSL

  17. Material Shader Template: #include "BaseLighting.inc" float4 PS ( PSInput i ) : SV_TARGET { float3 totalDiffuse = 0; float3 totalSpec = GetEnvLighting();; $include BaseLightLoopBegin // unique material code goes here!! Light accumulation on the pixel for a given light // we have total incoming light and direct/indirect light components as well as material params and shadow term // use these building blocks to integrate lighting terms totalDiffuse += GetDiffuse(incomingLight); totalSpec+= CalcPhong(incomingLight); $include BaseLightLoopEnd float3 finalColor = totalDiffuse + totalSpec; return float4( finalColor, 1 ); }

  18. Debug Mode Demo

  19. Benchmark 3k dynamic lights

  20. Compute-based Deferred v.s. Forward+ Takahiro Harada, Jay McKee, Jason C.Yang, Forward+: Bringing Deferred Lighting to the Next Level, Eurographics Short Paper (2012)

  21. Depth Pre-Pass Critical • Pixel overdraw cripples this technique so depth pre-pass is required. • Depth pre-pass is good opportunity to use MRT to generate other full-screen data needed for post-fx and other render fx(optional).

  22. Other important points • XBOX 360 has good bandwidth so given limitations on forward rendering, deferred makes a lot of sense. • However, ALU computation growing at faster rate than bandwidth. more and more feasible to just do the calculations than to read/write so much data. • Dynamic branching penalties not nearly as bad as before. As an optimization, compute shader can sort by light-type for example to minimize penalties. • All that "light management" CPU side code to decide which lights hit each object for setting constant registers can be ditched!

  23. Summary • Modified forward renderer that handles scenes with 1000s of lights. • Hardware anti-aliasing (MSAA) “automatic” • Bandwidth friendly. • Makes the most of the GPU's ALU power (which is growing faster than bandwidth)

  24. Thanks! Contact: Takahiro.Harada@amd.com jay.mckee@amd.com jasonc.yang@amd.com Leo Demo website: http://developer.amd.com/samples/demos/pages/AMDRadeonHD7900SeriesGraphicsReal-TimeDemos.aspx Eurographics 2012: 'Forward+: Bringing Deferred Lighting to the Next Level'

More Related