550 likes | 781 Views
Nolan Goodnight. Rui Wang. Cliff Woolley. Greg Humphreys. University of Virginia. Interactive Time-Dependent Tone Mapping Using Programmable Graphics Hardware. HDR and Tone Mapping. Compressed. Clamped to [0,1]. Advances in graphics hardware. Physically-based rendering on the GPU
E N D
Nolan Goodnight • Rui Wang • Cliff Woolley • Greg Humphreys • University of Virginia Interactive Time-Dependent Tone Mapping Using Programmable Graphics Hardware
HDR and Tone Mapping Compressed Clamped to [0,1]
Advances in graphics hardware • Physically-based rendering on the GPU (Purcell et al, 2003) • High dynamic range texture mapping (Debevec et al, 2001)
System Overview • Interactive tone mapping system for an OpenGL application • application • tone mapping system • Display • callback • HDR image • Frame • buffer • LDR image
Interface to the application • tmInitialize();// Initialize the system • tmEnable();// Retarget GL calls • Draw geometry • tmCompress();// Compress output • tmDisable(); // Restore app context • tone mapping system • application
Choosing a tone mapping operator • Photographic Tone Reproduction for High Contrast Images (Reinhard et al, 2002) • Global operator is a simple transfer function 1 0
Choosing a tone mapping operator • Local operator • Digital analog to ‘burning’ and ‘dodging’ Center-surround
Why use this tone mapping operator? • Global operator is simple and fast to compute • Only one global computation • We can dynamically choose the number of zones
Variable number of zones: 3 3 Zones
Variable number of zones: 4 3 Zones
Variable number of zones: 5 3 Zones
Variable number of zones: 6 3 Zones
Variable number of zones: 7 3 Zones
Variable number of zones: 8 3 Zones
Implementation • Target architecture • ATI Radeon 9800 (R350) • Data storage • Floating-point off-screen buffers (pbuffers) • Multiple rendering surfaces (GL_AUXi) • Algorithms • ARB fragment and vertex assembly • Generate fragments with image-sized quads • Data representation • Vector vs. scalar organization
Simple luminance transform Store luminance and log luminance in separate channels Implementation: global operator • HDR image • Luminance • Log luminance • luminance • log luminance • Mipmap • reduction • LDR image • Single buffer
Implementation: global operator • Single rendering surface • HDR image • Luminance • Log luminance • Mipmap • reduction • log luminance channel • log average luminance • LDR image • Single buffer
Implementation: global operator • HDR image • texture 0 • operator • shader • Luminance • Log luminance • texture 1 • texture 2 • Mipmap • reduction • LDR image • Single buffer
Implementation: GPU-based convolutions • Transform n-vector product into multiple 4-vector products
Output 4 pixels at the same time Useful for expensive algorithms Requires a conversion back to scalar form. Vectorizing the luminance Stacked domain
Vectorizing the luminance • A simple method for luminance vectorization:
Vectorizing the luminance • A simple method for luminance vectorization:
Vectorizing the luminance • A simple method for luminance vectorization:
Vectorizing the luminance • A simple method for luminance vectorization:
Vectorizing the luminance • A simple method for luminance vectorization: • Preserves spatial locality
GPU-based convolutions • stacked • image
GPU-based convolutions • Pass 1 • stacked • image
GPU-based convolutions • Pass 1 • Pass 2 + • stacked • image
GPU-based convolutions • Pass 1 • Pass 2 • Pass 3 + + • stacked • image
GPU-based convolutions • Compute multiple 4-vector products per pass • Less shader and texture switching • Single render pass + + • stacked • image
GPU-based convolutions • Compute multiple 4-vector products per pass • Less shader and texture switching • Single render pass + + • stacked • image
GPU-based convolutions • Compute multiple 4-vector products per pass • Less shader and texture switching • Single render pass + + • stacked • image
GPU-based convolutions • Compute multiple 4-vector products per pass • Less shader and texture switching • Single render pass + + • stacked • image
GPU-based convolutions • Compute multiple 4-vector products per pass • Less shader and texture switching • Single render pass + + • stacked • image
GPU-based convolutions Advantages: Handles large kernels Efficient memory access No transform back to scalar values 512 X 512 image 11 x 11 kernel ~ 6 ms 21 x 21 kernel ~ 10 ms 41 x 41 kernel ~ 16 ms
filtered filtered Calculating adaptation zones on the GPU luminance luminance FRONT 0 1 BACK Buffer 0 Buffer 1
filtered filtered Calculating adaptation zones on the GPU luminance luminance FRONT 2 1 BACK Buffer 0 Buffer 1
filtered filtered Calculating adaptation zones on the GPU luminance luminance FRONT 2 3 BACK Buffer 0 Buffer 1
filtered filtered Calculating adaptation zones on the GPU luminance luminance FRONT 4 3 BACK Buffer 0 Buffer 1
Performance: global operator Frames per second Image size
Performance: local operator Frames per second Number of zones
Results: Accuracy • Comparison with CPU: 512 x 512 image
Images generated at ~30Hz • Clamped [0,1] • Compressed: 2 zones
Images generated at ~30Hz • Clamped [0,1] • Compressed: 2 zones