DAVANet : Stereo Deblurring with View Aggregation

DAVANet: Stereo Deblurring with View Aggregation Haitong Shi 5.6

DAVANet - Depth Awareness and View Aggregation • Introduction • Motivation • Network Architecture and Losses • Stereo blur dataset • Experiments

Introduction A stereo camera is a type of camera with two or more lenses with a separate image sensor or film frame for each lens. • Stereo image deblurringhas rarely been discussed • Dynamic scene deblurring from a single blurry image is a highly ill-posed task • propose a novel depth-aware and view-aggregated stereo deblurringnetwork named DAVANet • propose a large-scale multi-scene stereo blurry image dataset

Motivation (i) Depth information can provide helpful prior information for estimating spatially-varying blur kernels (ii) The varying information in corresponding pixels cross two stereo views can help blur removal. Depth-varying and view-varying blur. (a, b) are the stereo blurry images, (c, d) are the motion trajectories in terms of optical flow which models the blur kernels and (e, f) are the estimated disparities

Motivation 1.Depth-Varying Blur： 2.View-Varying Blur： (a) is the depth-varying blur due to relative translation parallel to the image plane. (b) and (c) are the view varying blur due to relative translation along depth direction and rotation

Network Architecture single-image deblurring bidirectional disparities estimation The overall structure of stereo deblurring network DAVANet, where the depth and the two-view information from the DispBiNet and the DeblurNet are integrated in FusionNet.

Network Architecture – DeblurNetDispBiNet

Network Architecture – Context Module · dilated convolutions Context Module fuses richer hierarchical context information that benefit both blur removal and disparity estimation dilated rates :2 The four dilated rates are set to: 1, 2, 3, 4.

Network Architecture - Fusion Network : Features from DeblurNet encoder : original stereo images : the estimated disparity of left view : Features of the second last layer of DispBiNet a soft gate map ranging from 0 to 1 Θdenotes element-wise multiplication

Network Architecture - Fusion Network Input:

Losses • DeblurringLosses 1. MSE loss: 2. perceptual loss： features from conv3-3 layer (j=15) ：the features from the j-th convolution layer within the pretrained VGG-19 network

Losses • Disparity Estimation Loss mask map， remove the invalid and occlusion regions the number of scales of the network estimated disparities ground truth

Stereo Blur Dataset • 1.use the ZED stereo camera (frame rate 60 fps) to capture data • 2.increase the video frame rate to 480 fps using a fast and high-quality frame interpolation method https://arxiv.org/abs/1708.01692 • 3.average the varying number (17, 33, 49) of successive frames to generate different blur in size • 135 diverse real-world sequences of dynamic scenes • 20,637 blurry - sharp stereo image pairs with the corresponding bidirectional disparity the mask map • 98 training sequences (17,319 samples) and 37 testing sequences (3,318 samples)

Experiments - Training • pretrainDeblurNeton the presented dataset， DispBiNeton a subset (10,806 samples) of FlyingThings3D dataset • finetune the DispBiNet fully on Stereo Blur Dataset until convergence • Jointly train the overall network on Stereo Blur Dataset

Experiments · under the proposed Stereo Blur Dataset

Experiments • On the GOPRO dataset

Effectiveness of the disparity （d）do not warp features from the other view in the FusionNet （c）feed two exactly the same images into the proposed network

Ablation study • Context Module, depth awareness, and view aggregation • replace the Context Module of DeblurNet by the one-path convolution block with the same number of layers • remove disparity loss of DispBiNet • Substitute with

Conclusion • Advantages depth awareness and view aggregation accuracy, speed, model size Disadvantages ablation study？

THANKS！

DAVANet : Stereo Deblurring with View Aggregation