390 likes | 602 Views
Shader Components: Modular and High Performance Shader Development. Yong He (Carnegie Mellon University) Tim Foley (NVIDIA) Teguh Hofstee (Carnegie Mellon University) Haomin Long (Tsinghua University) Kayvon Fatahalian (Carnegie Mellon University). Atmosphere Scattering.
E N D
Shader Components: Modular and High Performance Shader Development Yong He (Carnegie Mellon University) Tim Foley (NVIDIA) Teguh Hofstee (Carnegie Mellon University) Haomin Long (Tsinghua University) Kayvon Fatahalian (Carnegie Mellon University)
Atmosphere Scattering Geometry / Shader LOD Sub-surface Scattering Double-sided lighting Pre-baked Lighting Complex Wear Pattern Layered Terrain Texturing Skeletal Animated Character Dynamic Soft Shadow Vertex-animated Vegetation Epic Games, Inc.
Geometry / Animation StaticMesh SkeletalAnim VertexAnim Material Glass Plastic Metal Skylight Light PointLight SpotLight
Geometry / Animation StaticMesh SkeletalAnim VertexAnim Material Glass Plastic Metal Skylight Light PointLight SpotLight
On CPU, performing these two tasks is easy Object-oriented programming is a good model for representing this mental model IMaterial ILighting Skylight Spotlight Metal Plastic Glass
High performance GPU accelerated shading GPU CPU GPU shader code statically specialized to effects in use No dynamic control flow Specialized Shader Code Efficient CPU-GPU communication Avoid redundant shader parameter update Minimize CPU overhead (API calls)
Goals: modularity and performance Modularity Enable large shader code bases to be authored as composable effects High Performance Specialized GPU shader code for the effects in use Update shader parameters efficiently (only as necessary)
Skylight lightProbe strength 2.0 shadowMap Displacement displacementMap normalMap Metal Material roughness tint [0.4 0.4 0.4]
Skylight lightProbe strength 2.0 shadowMap Displacement displacementMap normalMap Metal Material roughness tint [0.5 0.5 0.5]
Skylight lightProbe strength 2.0 shadowMap Metal Material roughness tint [0.2 0.2 0.2]
Skylight lightProbe strength 2.0 shadowMap Brick Material diffuse tiling [0.4] uvOffset [0.0, 0.0]
Block 0 lightProbe strength 2.0 shadowMap Efficient parameter communication via parameter blocks in D3D12 / Vulkan Block 1 displacementMap normalMap BindParameterBlock(0, block0); BindParameterBlock(1, block1); BindParameterBlock(2, block2); Draw(); BindParameterBlock(2, block3); Draw(); Block 2 roughness tint [0.3 0.3 0.3] Block 3 roughness tint [0.4 0.5 0.8]
Problem: shading language lacks features to match API Skylight Params sampler2D float 2.0 sampler2DShadow Shader Code for Metal Shader Code for Skylight Shading Code for Displacement samplerCubelightProbe; float strength; sampler2DShadow shadowMap; sampler2D displacementMap; sampler2D normalMap; sampler2D roughnessMap; vec3 tint; ... void main() { ... } Displacement Params sampler2D sampler2D ? Metal Params sampler2D vec3 [0.3 0.3 0.3]
Workaround 1: single parameter block (low performance) All Params sampler2D float 2.0 sampler2DShadow Shader Code for Metal Shader Code for Skylight Shading Code for Displacement samplerCubelightProbe; float strength; sampler2DShadow shadowMap; sampler2D displacementMap; sampler2D normalMap; sampler2D roughnessMap; vec3 tint; ... void main() { ... } Displacement Params sampler2D sampler2D Metal Params sampler2D vec3 [0.3 0.3 0.3]
Workaround 2: explicit annotations (breaks modularity) Skylight Params sampler2D float 2.0 sampler2DShadow Shader Code for Metal Shader Code for Skylight Shading Code for Displacement layout(set=0,binding=0) layout(set=0,binding=1) layout(set=1,binding=2) layout(set=1,binding=0) layout(set=1,binding=1) layout(set=2,binding=0) layout(set=2,binding=1) samplerCubelightProbe; float strength; sampler2DShadow shadowMap; sampler2D displacementMap; sampler2D normalMap; sampler2D roughnessMap; vec3 tint; ... void main() { ... } Displacement Params sampler2D sampler2D Metal Params sampler2D vec3 [0.3 0.3 0.3]
Our Contribution: Shader Components A single abstraction that is used to • Define a module of shader code • Group parameters into blocks • Compose effects into the final specialized shaders
A shader component defines both parameters and code component Metal{ param sampler2D roughnessMap; param vec3 tint; ... vec3 evalReflectance() { return GGX(roughnessMap.Sample(uv)) * tint; }} Engine-side Parameters Code
A shader library consists of many components Material Patterns Camera / View Transform component Camera param vec3 pos; param vec3 dir; param mat4 viewTransform; component MetalMaterial : IMaterialPattern param sampler2D roughnessMap; param vec4 tint; public vec3 color = ggx(...) // ... component WoodMaterial : IMaterialPattern component StaticMesh : IObjectTransform component SkeletalAnim : IObjectTransform param mat4 boneTransforms[]; @MeshVertex vec3 vertPos; @MeshVertex uvec4 boneIds, boneWeights; ... vec3 transformedPos { ... } Static Mesh / Skeletal Animation Lighting component DynamicLighting : ILighting param vec3 lightPos; param vec3 lightIntensity; public vec3 lightResult = ...; component StaticLighting : ILighting param texture2D lightmap; public vec3 lightResult = lightmap.Sample(...); Other: tessellation, geometry displacement etc.
Parameters for a component are placed in the same parameter block component Metal{ param sampler2D roughnessMap; param vec3 tint; ... vec3 evalReflectance() { return GGX(roughnessMap.Sample(uv)) * tint; }} 0 roughnessMap 1 tint Parameter Block Layout
Type: Metal roughness tint [0.3 0.3 0.3] Type: Metal roughness tint [0.4 0.5 0.8] Type: Metal roughness tint [0.2 0.2 0.2] Type: Metal roughness tint [1.2 0.8 0.2] Type: Metal roughness tint [0.4 0.4 0.4] Type: Metal roughness tint [0.5 0.4 0.4] Metal Component Instances
Type: Brick diffuseMap tiling [0.4] uvOffset [0.0, 0.0] Type: Brick diffuseMap tiling [0.3] uvOffset [0.1, -0.1] Type: Brick diffuseMap tiling [0.5] uvOffset [0.3, 0.3] Brick Component Instances
Component Instances (Parameter Blocks in GPU memory) inst0: Skylight inst1: Displac… BindComponentInst(0, inst0); inst2: Metal inst3: Metal BindComponentInst(1, inst1); BindComponentInst(2, inst2); GPU Pipeline State Components in use Draw(); Parameter Block Binding BindComponentInst(2, inst3); Skylight Draw(); Displacement Metal Shader Binding .hlsl <Shader 0>
What we did • Added shader component construct to shading language • A shader component instance corresponds to a parameter block • Observation: the frequency of parameter update aligns with modular constructs
Related work • Modular shading constructs only organize code not parameters • CG Interfaces [Mark 2003] • HLSL classes / interfaces [Microsoft 2011] • Spark [Foley 2011] • Sh [McCool 2004]
Related work • Modular shading constructs only organize code not parameters • CG Interfaces [Mark 2003] • HLSL classes / interfaces [Microsoft 2011] • Spark [Foley 2011] • Sh [McCool 2004] • Bungie TFX [Natalya 2017]
We implemented an engine with a large, modular shader library • Shader library contains 40 components, 2500 lines of code • Skeletal Animation • Material-defined vertex animation • PN-triangle tessellation [Vlachos et al. 2001] • Parallax occlusion mapping [Tatarchuk 2006] • Transparency and alpha masking • Per-material pattern generation (ported 20 unique patterns from UE4) • Double-sided lighting • Cascaded shadow maps [Engel 2006] • Directional lighting • Physically based environment lighting [Karis 2013] • Atmosphere scattering [Bruneton and Neyret 2008]
Three Renderer Implementations for Performance Evaluation • components: On Vulkan (using descriptor set as parameter block) • baseline_vk: On Vulkan with D3D11 style binding • Uses only one parameter block for all parameters, which means the engine dynamically re-allocates parameter block storage every frame • This is what UE4/Source2 are doing, and we’d like to compare performance against. • Baseline_gl: On OpenGL (without bindless texture)
Test scenes BOXES Draw Calls: 17,431 Variants: 2 Materials: 1 Shader Changes: 5 FACTORY1 Draw Calls: 8,755 Variants: 26 Materials: 84 Shader Changes: 130 FACTORY2 Draw Calls: 10,988 Variants: 26 Materials: 84 Shader Changes: 220 ROME Draw Calls: 26,834 Variants: 24 Materials: 67 Shader Changes: 129 BOXES1K Draw Calls: 17,431 Variants: 200 Materials: 1,000 Shader Changes: 3,635
CPU Performance Comparison (single core) The components renderer uses 2x less CPU time than baseline_vk component CPU Time baseline_vkCPU Time
Overall Frame Time Speedup of CPU work translates to reduced whole frame time component CPU Time baseline_vkCPU Time component whole frame time baseline_vkwhole frame time
Both of the Vulkan renderers have lower CPU overhead than OpenGL renderer CPU Time
Integration into NVIDIA Falcor rendering framework • Over 30% faster CPU time per frame • Up to 2x faster shading performance from specialized shader kernels • Cleaner engine code, no ad-hoc code preprocessor
Summary • Identified why performance and modularity are at odds in D3D12/Vulkan • Introduced shader component abstraction, and identified necessary compiler guarantees to ensure efficient parameter binding • A simple extension of modern shading systems
Thank you Support: NVIDIA Research National Science Foundation Activision We had valuable conversations with: Nir Bentey (NVIDIA) Natasha Tartachuck (Unity) Wade Brainard (Activison) Michael Vance (Activision) Peter-Pike Sloan (Activision) Hao Chen (Amazon)