Practical Implementation of High Dynamic Range Rendering

Practical Implementation of High Dynamic Range Rendering Masaki Kawase BUNKASHA GAMES BUNKASHA PUBLISHING CO.,LTD http://www.bunkasha-games.com http://www.daionet.gr.jp/~masa/

Agenda • What can be done with HDR? • HDR rendering topics • Implementation on DX8 hardware • Implementation on DX9 hardware • Glare generation • Exposure control • HDR in games • References

What can be done with HDR? • Dazzling light • Dazzling reflection • Fresnel reflection • bright reflection in the normal direction • Exposure control effects • Realistic depth-of-field effects • Realistic motion blur Agenda

Dazzling Light

Dazzling Reflection

HDR Fresnel Bright reflection off low-reflectance surfaces

Exposure Control

HDR Depth-of-Field Future perspective

HDR Motion Blur Future perspective

HDR Rendering Topics • Dynamic range • HDR buffers • Glare generation • Exposure control • Tone mapping Agenda

Dynamic Range • The ratio of the greatest value to the smallest value that can be represented • Displayable image • 28 Low dynamic range (LDR) • Frame buffer of absolute luminance • Render the scene in absolute luminance space • >232 represents all luminances directly • Frame buffer of relative luminance • Apply exposure scaling during rendering • >215~16 dark regions are not important

HDR Buffers • Frame buffers • For glare generation • Environment maps • To prevent luminances from being clamped when the surface reflectance is very low • To achieve dazzling reflection • Self-emission textures • For post-rendering glare generation • Decal textures don’t need HDR

HDR Frame Buffers • For glare generation • When rendering with relative luminances: • Ideally, more than 215~16 • In games • 212~13 (4,000~10,000) is acceptable

HDR Environment Maps • Very important for representing: • Realistic specular reflection • Dazzling specular reflection • Specular reflectance of nonmetals • Reflectance in the normal direction is typicallyless than 4% • Bright light remains bright after such low reflection • To maintain dazzles after reflection of ~1-4% • Dynamic range of more than 10,000 or 20,000 is necessary

Implementation on DX8 Hardware • We have no choices • Pixel Shader 1.x • Integer operations only • HDR buffer formats • Low-precision buffers only • Use the alpha channel as luminance information • Fake it to achieve believable appearance • Accurate calculation is not feasible Agenda

Glossy reflection material Fake HDR Pixel Shader ps_1_1 text0 text1 text2 madr0.rgb, v0, t2, v1 // Scale the primary diffuse color by // shadow/light map, and add the result of // other per-vertex lighting +mult0.a, v1.a, t0.a // Scale the specular reflectance // by gloss map mulr0.rgb, t0, r0 // Modulate diffuse color with decal texture +mulr0.a, t0.a, t1.a // r0.a = specular reflectance * envmap luminance mult1.rgb, t1, c0 // Modulate envmap with specular color +mulr1.a, r0.a, t1.a // Envmap brightness parameter // r1.a = specular reflectance * envmap luminance * gloss map lrpr0.rgb, t0.a, t1, r0 // Reflect the envmap by specular reflectance mulr1.a, r1.a, c0.a // Envmap brightness parameter // r1.a = specular reflectance * envmap luminance * gloss map * Clamp(gloss * 2, 0, 1) lrpr0.rgb, r1.a, t1, r0 // Output color // Interpolate the envmap color and the // result of LDR computation, based on // the envmap brightness parameter (r1.a) +lrpr0.a, r1.a, t1.a, r0.a // Output luminance information // v0.rgb : Diffuse color of primary light // v1.rgb : Color for other lights/ambient // pre-scaled by (exposure * 0.5) // // v1.a : Specular reflectance (Fresnel) // // t0.rgb : Decal texture (for diffuse) // t0.a : Gloss map (for specular) // t1.rgb : Envmap color // t1.a : Envmap luminance // t2.rgb : Shadow/light map // // c0.rgb : Specular color // c0.a : Clamp(gloss * 2, 0, 1)

Self-emission material Fake HDR Pixel Shader ps_1_1 text0 text1 text2 madr0.rgb, v0, t2, v1 // Scale the primary diffuse color by // shadow/light map, and add the result of // other per-vertex lighting +mult0.a, t0.a, c1.a // Scale the self-emission luminance mulr0.rgb, t0, r0 // Modulate diffuse color with decal texture mulr1.rgb, t0, c1 // decal texture * self-emission color lrpr0.rgb, t0.a, r1, r0 // Output color // Add self-emission color to diffuse color // Make it close to the self-emission color // where the luminance information is high +movr0.a, t0.a // Output luminance information // v0.rgb : Diffuse color for primary light // v1.rgb : Color for other lights/ambient // pre-scaled by (exposure * 0.5) // // t0.rgb : Decal texture (for diffuse) // t0.a : Self-emission luminance // t2.rgb : Shadow/light map // // c1.rgb : Self-emission color // (to be modulated by t0.rgb) // c1.a : Emission intensity (scale for t0.a) // pre-scaled by (exposure * 0.5)

Generating Displayable Image • Extract high-luminance regions Threshold : ~0.4-0.5 • Generate glare • Reference: • Kawase, Masaki, “Frame Buffer Postprocessing Effects in DOUBLE-S.T.E.A.L (Wreckless)” • Generate a displayable image • Calculate the luminance from the frame buffer • Add the result of glare generation to the luminance

Bright-Pass Pixel Shader // Bright-Pass Pixel Shader // Out.rgb = FrameBuffer.rgb * ( (1 - Threshold) / 16 + FrameBuffer.a ) // t0.rgb : Frame-buffer color // t0.a : Frame-buffer extra luminance information // // c0.a : Luminance bias // (1 - Threshold) / 16 // 0.5/16 ~ 0.6/16 ps_1_1 text0 // Frame buffer addr0.a, t0.a, c0.a // r0.a = (1 - Threshold) / 16 + Buffer.a mulr0.rgb, t0, r0.a // r0.rgb = Buffer.rgb * ( (1 - Threshold) / 16 + Buffer.a )

Glare Generation • Reference • Kawase, Masaki, “Frame Buffer Postprocessing Effects in DOUBLE-S.T.E.A.L (Wreckless)”

Luminance Calculationand Glare Composition // Tone Mapping / Glare Composition Pixel Shader // Out.rgb = (FrameBuffer.rgb + FrameBuffer.rgb * FrameBuffer.a^2 * 16) * 2 + Glare.rgb * Glare.rgb // t0.rgb : frame buffer color // t0.a : frame buffer extra luminance information // t1.rgb : result of glare generation ps_1_1 text0 // frame buffer text1 // glare mul_x4r0.a, t0.a, t0.a // r0.a = FrameBuffer.a^2 * 4 mul_x4r1.rgb, t0, r0.a // r1.rgb = FrameBuffer.rgb * FrameBuffer.a^2 * 16 add_x2r0.rgb, t0, r1 // (FrameBuffer.rgb + FrameBuffer.rgb * FrameBuffer.a^2 * 16) * 2 madr0.rgb, t1, t1, r0 // add the glare

Notes on DX8 Implementation • Accurate calculation is not feasible • How to make it believable by faking • Based on appearance rather than theory

Implementation on DX9 Hardware • There are currently many limitations • Choose implementations accordingly • Pixel Shader • Pixel Shader 2.0 or later • Pixel Shader 1.x • Buffer formats for HDR • High-precision integer/float buffers • Low-precision integer buffers Agenda

Issues with High-Precision Buffers • Memory usage • A16B16G16R16 / A16B16G16R16F • 64bpp (bits per pixel) • Twice as much as A8R8G8B8 • A32B32G32R32 / A32B32G32R32F • 128bpp • Four times as much as A8R8G8B8 • At least twice as much memory as the conventional full-color buffer is needed

Issues with High-Precision Buffers • Limitations • Alpha blending cannot be used • Texture filtering cannot be used with floating-point formats • Affects the quality of environment maps and self-emission textures • Some systems don’t support them

Practicability ofHigh-Precision Buffers • Current hardware has many problems • In order to use high-precision buffers: • A16B16G16R16 / A16B16G16R16F • The hardware must support them • Plenty of memory is needed • You don’t use alpha blending • The situation is not good…

Use Low-Precision Buffers • Make use of low-precision buffers • A8R8G8B8 / A2R10G10B10 etc. • Low memory consumption • Alpha blending can be used • Operations may be incorrect, but it doesn’t matter very much

Compression with Tone Mapping • Render directly to displayable format • Nonlinear color compression • Effectively wide dynamic range • Reference: • Reinhard, Erik, Mike Stark, Peter Shirley, and Jim Ferwerda, “Photographic Tone Reproduction for Digital Images” • The alpha channel is not used • Can be used for any other purpose

Low-Precision Buffer Formats • A8R8G8B8 (8 bits per color channel) • The alpha channel can be used for any other purpose • Works on any system • RGB precision may be insufficient • A2B10G10R10A2R10G10B10 (10 bits per color channel) • The color channels have nearly ideal precision • Few alpha bits • Not suited for storing other information • Many systems do not support it

Environment Map Formats • Ideally… • Dynamic range of more than 10,000 or 20,000 • Texture filtering available

Environment Map Formats • Relatively low resolution • Alpha channel/blending is not very important • Use the 16-bit integer format if enough memory storage is available • Treat it as having an interval of [0, 256] or [0, 512] • Fast encoding/decoding • Texture filtering can be used • In the future • Do it all with A16B16G16R16F

Low-Precision Environment Maps • Use them when: • High-precision buffers are not supported, or • Memory storage is limited • If the fill-rate of your system is relatively low • Use the same format as used in DX8 fake HDR • If the fill-rate is high enough: • Nonlinear color compression • Similar to tone mapping • Store exponents into the alpha channel • More accurate operations are possible • Using it just as a scale factor is not enough • Even the DX8 fake HDR has a much bigger impact

Color Compression • Similar to tone mapping • Encode when rendering to an environment map Offset : luminance curve controlling factor (~2-4) • A bigger offset means: • High-luminance regions have higher resolutions • Low-luminance regions have Lower resolutions • Decode when rendering to a frame buffer • From the environment map fetched d : a small value to avoid divide-by-zero

Color Compression • Use carefully • Mach banding may become noticeable on reflections of large area light sources • e.g. Light sky

E8R8G8B8 • Store a common exponent for RGB into the alpha channel • Use a base of 1.04 to 1.08 offset : ~64-128 • Base=1.04 means dynamic range of ~23,000 (1.04256) • A bigger base value means: • Higher dynamic range • Lower resolution (Mach banding becomes noticeable) • Encode when rendering to an environment map • Decode when rendering to a frame buffer • From the environment map fetched

E8R8G8B8 Encoding (HLSL) // a^n = b #define LOG(a, b) ( log((b)) / log((a)) ) #define EXP_BASE (1.06) #define EXP_OFFSET (128.0) // Pixel Shader (6 instruction slots) // rgb already exposure-scaled float4 EncodeHDR_RGB_RGBE8(in float3 rgb) { // Compute a common exponent float fLen = dot(rgb.rgb, 1.0) ; float fExp = LOG(EXP_BASE, fLen) ; float4 ret ; ret.a = (fExp + EXP_OFFSET) / 256 ; ret.rgb = rgb / fLen ; return ret ; } // More accurate encoding #define EXP_BASE (1.04) #define EXP_OFFSET (64.0) // Pixel Shader (13 instruction slots) float4 EncodeHDR_RGB_RGBE8(infloat3 rgb) { float4 ret ; // Compute a common exponent // based on the brightest color channel float fLen = max(rgb.r, rgb.g) ; fLen = max(fLen, rgb.b) ; float fExp = floor( LOG(EXP_BASE, fLen) ) ; float4 ret ; ret.a = clamp( (fExp + EXP_OFFSET) / 256, 0.0, 1.0 ) ; ret.rgb = rgb / pow(EXP_BASE, ret.a * 256 - EXP_OFFSET) ; return ret ; }

// Pixel Shader (5 instruction slots) float3 DecodeHDR_RGBE8_RGB(infloat4 rgbe) { float fExp = rgbe.a * 256 - EXP_OFFSET ; float fScale = pow(EXP_BASE, fExp) ; return (rgbe.rgb * fScaler) ; } Encoding/decoding should be done using partial-precision instructions Rounding errors inherent in the texture format are much bigger E8R8G8B8 Decoding // If R16F texture format is available, // you can use texture to convert alpha to scale factor float3 DecodeHDR_RGBE8_RGB(infloat4 rgbe) { // samp1D_Exp: 1D float texture of 256x1 // pow(EXP_BASE, uCoord * 256 - EXP_OFFSET) float fScale = tex1D(samp1D_Exp, rgbe.a).r ; return (rgbe.rgb * fScale) ; }

Self-Emission Textures • Textures for emissive objects • Use the alpha channel as a common scale factor for RGB scale : ~16-128 • Bright enough to cause glare • Encode offline • Decoding is fast

Glossy reflection material Rendering with Tone Mapping float4 PS_GlossReflect(PS_INPUT_GlossReflect vIn) : COLOR0 { float4 vDecalMap = tex2D(samp2D_Decal, vIn.tcDecal) ; float3 vLightMap = tex2D(samp2D_LightMap, vIn.tcLightMap) ; float3 vDiffuse = vIn.cPrimaryDiffuse * vLightMap + vIn.cOtherDiffuse ; vDiffuse *= vDecalMap ; // HDR-decoding of environment map float3 vSpecular = DecodeHDR_RGBE8_RGB( texCUBE(sampCUBE_EnvMap, vIn.tcReflect) ) ; float3 vRoughSpecular = texCUBE(sampCUBE_DullEnvMap, vIn.tcReflect) ; float fReflectance = tex2D( samp2D_Fresnel, vIn.tcFresnel ).a ; fReflectance *= vDecalMap.a ; vSpecular = lerp(vSpecular, vRoughSpecular, fShininess) ; float3 vLum = lerp(vDiffuse, vSpecular, fReflectance) ; // HDR tone-mapping encoding float4 vOut ; vOut.rgb = vLum / (vLum + 1.0) ; vOut.a = 0.0 ; return vOut ; } struct PS_INPUT_GlossReflect { float2 tcDecal : TEXCOORD0 ; float3 tcReflect : TEXCOORD1 ; float2 tcLightMap : TEXCOORD2 ; float2 tcFresnel : TEXCOORD3 ; // Exposure-scaled lighting results // Use TEXCOORD to avoid clamping float3 cPrimaryDiffuse : TEXCOORD6 ; float3 cOtherDiffuse : TEXCOORD7 ; } ;

Self-emission material Rendering with Tone Mapping float4PS_SelfIllum(PS_INPUT_SelfIllum vIn) : COLOR0 { float4 vDecalMap = tex2D(samp2D_Decal, vIn.tcDecal) ; float3 vLightMap = tex2D(samp2D_LightMap, vIn.tcLightMap) ; float3 vDiffuse = vIn.cPrimaryDiffuse * vLightMap + vIn.cOtherDiffuse ; vDiffuse *= vDecalMap ; // HDR-decoding of self-emission texture // fEmissiveScale : self-emission luminance * exposure float3 vEmissive = vDecalMap.rgb * vDecalMap.a * fEmissiveScale ; // Add the self-emission float3 vLum = vDiffuse + vEmissive ; // HDR tone-mapping encoding float4 vOut ; vOut.rgb = vLum / (vLum + 1.0) ; vOut.a = 0.0 ; return vOut ; } struct PS_INPUT_SelfIllum { float2 tcDecal : TEXCOORD0 ; float2 tcLightMap : TEXCOORD2 ; // Exposure-scaled lighting results // Use TEXCOORD to avoid clamping float3 cPrimaryDiffuse : TEXCOORD6 ; float3 cOtherDiffuse : TEXCOORD7 ; } ;

Generating Displayable Image • Extract high-luminance regions Threshold : ~0.5-0.8 • Divide by (1 - Threshold) to normalize • Generate glare • Use an integer buffer to apply texture filtering • Hopefully, a float buffer with filtering… • Generate a displayable image • Add the glare to the frame buffer

struct PS_INPUT_Display { float2 tcFrameBuffer : TEXCOORD0 ; float2 tcGlare : TEXCOORD1 ; } ; float4PS_Display(PS_INPUT_Display vIn) : COLOR0 { float3 vFrameBuffer = tex2D(samp2D_FrameBuffer, vIn.tcFrameBuffer) ; float3 vGlare = tex2D(samp2D_Glare, vIn.tcGlare) ; // Add the glare float4 vOut ; vOut.rgb = vFrameBuffer + vGlare * vGlare ; vOut.a = 0.0 ; return vOut ; } Glare Composition

Notes on DX9 Implementation • High-precision buffers • Consumes a lot of memory • No blending capability • Will be recommended in the near future… • Low-precision buffers • Pixel shaders are expensive • Consider fake techniques like DX8 • High performance • Low memory consumption • Very effective

Glare Generation • Notes on glare generation • Bloom effects by compositing multiple Gaussian filters • Image processing and sprites Agenda

Notes on Glare Generation • Requires a lot of rendering passes • Beware of precision of integer buffers • Clamping • Rounding errors • 16-bit integer may be insufficient • Always maintain an appropriate range

Multiple Gaussian Filters • Bloom generation • A single Gaussian filter does not give very good results • Small effective radius • Not sharp enough around the light position • Composite multiple Gaussian filters • Use Gaussian filters of different radii • Larger but sharper glare becomes possible Agenda

Multiple Gaussian Filters

Multiple Gaussian Filters Original image

Multiple Gaussian Filters • A filter of large radius is very expensive • Make use of downscaled buffers • A large radius means a strong low-pass filter • Apply a blur filter to a low-res version of the image and magnify it by bilinear filtering  The error is unnoticeable • Change the image resolution rather than the filter radius • 1/4 x 1/4 (1/16 the cost) • 1/8 x 1/8 (1/64 the cost) • 1/16 x 1/16 (1/256 the cost) • 1/32 x 1/32 (1/1024 the cost) • … • Even a large filter of several hundred pixels square can be applied very quickly

Applying Gaussian Filters to Downscaled Buffers 1/4 x 1/4 (256x192 pixels) 1/8 x 1/8 (128x96 pixels) 1/16 x 1/16 (64x48 pixels) 1/32 x 1/32 (32x48 pixels) 1/64 x 1/64 (16x12 pixels)

Practical Implementation of High Dynamic Range Rendering