Zucheul Lee , Ramsin Khoshabeh , Jason Juang and Truong Q. Nguyen (UCSD)

Local Stereo Matching Using Motion Cue and Modified Census in Video Disparity Estimation ZucheulLee, RamsinKhoshabeh, Jason Juangand Truong Q. Nguyen (UCSD) 20th European Signal Processing Conference (EUSIPCO 2012)

Outline • Introduction • Framework • Proposed Algorithm • Experimental Results • Conclusion

Introduction

Background • The disparity estimation has been thoroughly studied • Focus strictly on images • Video disparity estimation: • (1) Lack of video datasets with ground-truth disparity maps • (2) Temporal inconsistency problems • flickering resulting from simply applying image-based algorithms to video

Background • Fundamental attributes that group objects together locally: • Proximity • Similarity • Motion • The objects grouped by these attributes are most likely to have the same depth. Image disparity estimation • - Important for accurate depth estimation near edges of moving objects

Objective • Propose a more accurate and noise tolerant method for video disparity estimation • More accurate than other methods on edges and in flat (textureless) areas • Using: • Motion cues (edges) • Modified census transform (flat areas) • Spatio-temporal consistency (refinement)

Related Work • Adaptive Weight[6] • Cost-volume filtering[7] • Guided filter • Spatio-Temporal Consistency[3] Do not provide a reliable solution for disparity estimation in textureless (flat) areas [6] K.-J. Yoon and I.-S. Kweon, “Adaptive Support-Weight Approach for Correspondence Search,” IEEE Trans. Pattern Anal. Mach. Intell., vol.28, no. 4, pp. 650-656, 2006. [7] C. Rhemann, A. Hosni, M. Bleyer, C. Rother, and M. Gelautz, “Fast Cost-Volume Filtering for Visual Correspondence and Beyond,” in Proc.IEEEIntl. Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 3017-3024,2011. [3] R. Khoshabeh, S. H. Chan, and T. Q. Nguyen, “Spatio-Temporal Consistency in Video Disparity Estimation,” ICASSP, pp. 885-888, 2011

Framework

ProposedAlgorithm

Support Weight Using Correlated Color and Motion • The support weight: • : color difference • : motion difference • γm :motion parameter • γs :similarity parameter

Support Weight Using Correlated Color and Motion • Let and be the color coordinates of pixel c and neighbor pixel q in the CIELab color space • Color difference: • Let and be the flow vectors[10] of pixel c and neighbor pixel q • Truncated motion difference: • τ: truncation value [10] D. Sun, S. Roth, M.J. Black, “Secrets of Optical FlowEstimation and Their Principles,” CVPR, pp. 2432-2439, 2010.

Benefits of a Motion Cue • The “car” video frames (480x270 15 disparity levels): Proximity + Similarity Proximity + Similarity + Motion Proximity

Modified Census Transform • Difficult in finding the correct correspondences in flat areas. • Due to the fact that the census matching cost is extremely sensitive to image noise since all pixels in flat areas have a similar intensity. • Three moded census transform with a noise buffer Problem: Solution:

Modified Census Transform • Using two bits to implement three modes • α:noise buffer threshold • Set 10if (neighbor pixel intensity) - (center pixel intensity) > α • Set 01 if (neighbor pixel intensity) - (center pixel intensity) < α • Set 00 otherwise • Intensity value 0~50 α= 0 • Intensity value 50~100 α= 1 • Intensity value 100~150 α= 2 • Intensity value 150~200 α= 3 • Intensity value 200~255 α= 4

Modified Census Transform • Raw matching cost: • : Intensity difference • compare two center pixels • : Hamming distance • compare the spatial structure • calculated by the bitwise XOR operation(Census transform)

Aggregation and disparity Computation • Aggregated matching cost: • Winner-take-all (WTA): : left and right support window w(cd , qd) : support weight of pixel qdin the right window D : the set of all possible disparities

Aggregation and disparity Computation Left view Original census Modified census (without intensity difference) Modified census

Spatio-temporal Consistency[3] Problem: • Simply applying image-based algorithms to individual frames • temporally inconsistent (even the best methods) • Consider the sequence of disparity maps as a space-time volume • A three-dimensional function f(x,y,t) with • (x,y) : spatial coordinates • t : temporal coordinate • Piecewise smooth solution: • has less temporal noise • preserves the disparity information as much as possible Solution:

Spatio-temporal Consistency[3] • l1– minimization problem: • f : unknown disparity map • g : initial disparity map from the previous step • D : forward difference operator • : piecewise smooth • : total variation norm • Video Restoration Problem: • g = Hf + Ƞ • f:unknown image(MN) • g : observed image(MN) • H : linear transformation representing convolution operator • Ƞ : noise

Spatio-temporal Consistency[3] • l1– minimization problem: • f : unknown disparity map f(x,y,t) : • Each frame of the video : M rows, N columns • Total: K frames • Stack the entries of f(x,y,z)into a column vector of size MNK x 1 x (M rows) y (N columns) t (K frames)

Spatio-temporal Consistency[3] • l1– minimization problem: • D : forward difference operator : parameters(constants)

Spatio-temporal Consistency[3] • Solve : • : piecewise smooth • : total variation norm • Solve sub-problem:f,u,riteratively [1]S. H. Chan, R. Khoshabeh, K. B. Gibson, P. E. Gill, and T. Q. Nguyen, “An augmented lagrangian method for total variation video restoration,” in ICASSP, May 2011

ExperimentalResults

Experimental Results • 5 synthetic videos with ground truth[14] (400300, 64 disparity range) • Compare LASW, Cost-filter, and proposed method • Without post-processing • γs= 17, γm= 1, γI= 3, γH= 20, and τ = 1 • Support window:1111, Census window:7 [14] C. Richardt, D. Orr, I. Davies, A. Criminisi, and N. A. Dodgson, “Real-time Spatiotemporal Stereo Matching Using the Dual-Cross-Bilateral Grid,” ECCV, 2010.

Experimental Results • Jamie1 from Microsoft i2i database

Experimental Results • Ilkayfrom Microsoft i2i database

Experimental Results • Tunnel

Experimental Results • Performance comparison of methods The average percentage of bad pixels (threshold of 1)

Experimental Results • 19s to compute the disparity map • Can be adopted into a real-time application (by using GPU) • Refinement using the TV method[3]reduces errors in the background (spatial noise and temporal inconsistencies)

Experimental Results • Spatio-Temporal Consistency[3]

Conclusion

Conclusion • Propose an accurate local stereo matching method for video disparity estimation • Motion cue • To obtain more accurate support weight • Modified census transform • To obtain more reliable raw matching costs in flat areas • Spatio-temporal volume • Improve spatial and temporal consistency • It presents the probability for directly extending current image-based disparity algorithms to the video domain

Zucheul Lee , Ramsin Khoshabeh , Jason Juang and Truong Q. Nguyen (UCSD)

Zucheul Lee , Ramsin Khoshabeh , Jason Juang and Truong Q. Nguyen (UCSD)

Presentation Transcript

Yen-Lin Lee and Truong Nguyen ECE Dept., UCSD, La Jolla, CA 92093-0407

Imaginarium Merging science and practice

By Mary Nguyen

Ai- mei Huang And Truong Nguyen

Q, Sun, C. Xie, and Y. Lee

Assets, Dynamics and Behavior Computation for virtual worlds and computer games

Eating Disorders: Assessment, Understanding, and Treatment Strategies [ Day One ]

Jason Lee | Sr. Manager, Customer Success APAC

Ai-Mei Huang , Student Member, IEEE, and Truong Nguyen, Fellow, IEEE

UCSD

WML by Mary Lee, Doug Kondor, Thu Nguyen

Jason Law Byeong Kil Lee

Quaternionic analyticity and SU(2 ) Landau Levels in 3D

Eating Disorders: Assessment, Understanding, and Treatment Strategies

Ai-Mei Huang And Truong Nguyen Image processing, 2006 IEEE international conference on

Presented by: Tan Q. Nguyen

Names: David Lu, Allen Lee and Jason Lee

WML by Mary Lee, Doug Kondor, Thu Nguyen

UCSD and UC: Minority Faculty Hiring, 1999-2005

Assets and Dynamics Computation for Virtual Worlds

AN UPDATE on DIVERTOR DESIGN and HEAT LOAD ANALYSIS