350 likes | 520 Views
Local Stereo M atching U sing Motion C ue and Modified C ensus in Video D isparity E stimation. Zucheul Lee , Ramsin Khoshabeh , Jason Juang and Truong Q. Nguyen (UCSD). 20th European Signal Processing Conference (EUSIPCO 2012 ). Outline. Introduction Framework
E N D
Local Stereo Matching Using Motion Cue and Modified Census in Video Disparity Estimation ZucheulLee, RamsinKhoshabeh, Jason Juangand Truong Q. Nguyen (UCSD) 20th European Signal Processing Conference (EUSIPCO 2012)
Outline • Introduction • Framework • Proposed Algorithm • Experimental Results • Conclusion
Background • The disparity estimation has been thoroughly studied • Focus strictly on images • Video disparity estimation: • (1) Lack of video datasets with ground-truth disparity maps • (2) Temporal inconsistency problems • flickering resulting from simply applying image-based algorithms to video
Background • Fundamental attributes that group objects together locally: • Proximity • Similarity • Motion • The objects grouped by these attributes are most likely to have the same depth. Image disparity estimation • - Important for accurate depth estimation near edges of moving objects
Objective • Propose a more accurate and noise tolerant method for video disparity estimation • More accurate than other methods on edges and in flat (textureless) areas • Using: • Motion cues (edges) • Modified census transform (flat areas) • Spatio-temporal consistency (refinement)
Related Work • Adaptive Weight[6] • Cost-volume filtering[7] • Guided filter • Spatio-Temporal Consistency[3] Do not provide a reliable solution for disparity estimation in textureless (flat) areas [6] K.-J. Yoon and I.-S. Kweon, “Adaptive Support-Weight Approach for Correspondence Search,” IEEE Trans. Pattern Anal. Mach. Intell., vol.28, no. 4, pp. 650-656, 2006. [7] C. Rhemann, A. Hosni, M. Bleyer, C. Rother, and M. Gelautz, “Fast Cost-Volume Filtering for Visual Correspondence and Beyond,” in Proc.IEEEIntl. Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 3017-3024,2011. [3] R. Khoshabeh, S. H. Chan, and T. Q. Nguyen, “Spatio-Temporal Consistency in Video Disparity Estimation,” ICASSP, pp. 885-888, 2011
Support Weight Using Correlated Color and Motion • The support weight: • : color difference • : motion difference • γm :motion parameter • γs :similarity parameter
Support Weight Using Correlated Color and Motion • Let and be the color coordinates of pixel c and neighbor pixel q in the CIELab color space • Color difference: • Let and be the flow vectors[10] of pixel c and neighbor pixel q • Truncated motion difference: • τ: truncation value [10] D. Sun, S. Roth, M.J. Black, “Secrets of Optical FlowEstimation and Their Principles,” CVPR, pp. 2432-2439, 2010.
Benefits of a Motion Cue • The “car” video frames (480x270 15 disparity levels): Proximity + Similarity Proximity + Similarity + Motion Proximity
Modified Census Transform • Difficult in finding the correct correspondences in flat areas. • Due to the fact that the census matching cost is extremely sensitive to image noise since all pixels in flat areas have a similar intensity. • Three moded census transform with a noise buffer Problem: Solution:
Modified Census Transform • Using two bits to implement three modes • α:noise buffer threshold • Set 10if (neighbor pixel intensity) - (center pixel intensity) > α • Set 01 if (neighbor pixel intensity) - (center pixel intensity) < α • Set 00 otherwise • Intensity value 0~50 α= 0 • Intensity value 50~100 α= 1 • Intensity value 100~150 α= 2 • Intensity value 150~200 α= 3 • Intensity value 200~255 α= 4
Modified Census Transform • Raw matching cost: • : Intensity difference • compare two center pixels • : Hamming distance • compare the spatial structure • calculated by the bitwise XOR operation(Census transform)
Aggregation and disparity Computation • Aggregated matching cost: • Winner-take-all (WTA): : left and right support window w(cd , qd) : support weight of pixel qdin the right window D : the set of all possible disparities
Aggregation and disparity Computation Left view Original census Modified census (without intensity difference) Modified census
Spatio-temporal Consistency[3] Problem: • Simply applying image-based algorithms to individual frames • temporally inconsistent (even the best methods) • Consider the sequence of disparity maps as a space-time volume • A three-dimensional function f(x,y,t) with • (x,y) : spatial coordinates • t : temporal coordinate • Piecewise smooth solution: • has less temporal noise • preserves the disparity information as much as possible Solution:
Spatio-temporal Consistency[3] • l1– minimization problem: • f : unknown disparity map • g : initial disparity map from the previous step • D : forward difference operator • : piecewise smooth • : total variation norm • Video Restoration Problem: • g = Hf + Ƞ • f:unknown image(MN) • g : observed image(MN) • H : linear transformation representing convolution operator • Ƞ : noise
Spatio-temporal Consistency[3] • l1– minimization problem: • f : unknown disparity map f(x,y,t) : • Each frame of the video : M rows, N columns • Total: K frames • Stack the entries of f(x,y,z)into a column vector of size MNK x 1 x (M rows) y (N columns) t (K frames)
Spatio-temporal Consistency[3] • l1– minimization problem: • D : forward difference operator : parameters(constants)
Spatio-temporal Consistency[3] • Solve : • : piecewise smooth • : total variation norm • Solve sub-problem:f,u,riteratively [1]S. H. Chan, R. Khoshabeh, K. B. Gibson, P. E. Gill, and T. Q. Nguyen, “An augmented lagrangian method for total variation video restoration,” in ICASSP, May 2011
Experimental Results • 5 synthetic videos with ground truth[14] (400300, 64 disparity range) • Compare LASW, Cost-filter, and proposed method • Without post-processing • γs= 17, γm= 1, γI= 3, γH= 20, and τ = 1 • Support window:1111, Census window:7 [14] C. Richardt, D. Orr, I. Davies, A. Criminisi, and N. A. Dodgson, “Real-time Spatiotemporal Stereo Matching Using the Dual-Cross-Bilateral Grid,” ECCV, 2010.
Experimental Results • Jamie1 from Microsoft i2i database
Experimental Results • Ilkayfrom Microsoft i2i database
Experimental Results • Tunnel
Experimental Results • Performance comparison of methods The average percentage of bad pixels (threshold of 1)
Experimental Results • 19s to compute the disparity map • Can be adopted into a real-time application (by using GPU) • Refinement using the TV method[3]reduces errors in the background (spatial noise and temporal inconsistencies)
Experimental Results • Spatio-Temporal Consistency[3]
Experimental Results • Spatio-Temporal Consistency[3]
Experimental Results • Spatio-Temporal Consistency[3]
Conclusion • Propose an accurate local stereo matching method for video disparity estimation • Motion cue • To obtain more accurate support weight • Modified census transform • To obtain more reliable raw matching costs in flat areas • Spatio-temporal volume • Improve spatial and temporal consistency • It presents the probability for directly extending current image-based disparity algorithms to the video domain