340 likes | 481 Views
Multi-view video based multiple objects segmentation using graph cut and spatiotemporal projections. Journal of Visual Communication and Image Representation Volume 21, Issues 5–6, July–August 2010, Pages 453–461 Qian Zhang, King Ngi Ngan
E N D
Multi-view video based multiple objects segmentation using graph cut and spatiotemporal projections Journal of Visual Communication and Image Representation Volume 21, Issues 5–6, July–August 2010, Pages 453–461 Qian Zhang, King NgiNgan Department of Electronic Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong Speaker : Yi-Ting Chen
Outline • Introduction • The proposed framework • Method • Segmentation for key view • Multi-view video segmentation • Experimental results • Conclusion
Outline • Introduction • The proposed framework • Method • Segmentation for key view • Multi-view video segmentation • Experimental results • Conclusion
Introduction • Most of the interest has been focused on the research of single view segmentation. • Depth information in the 3D scene can be reconstructed from multi-view images, but multiple view segmentation has not attracted much attention. • Most of the classical and start-of-the-art graph cut based segmentation algorithms require user’s interventions to specify the initial foreground and background regions as hard constraints.
Outline • Introduction • The proposed framework • Method • Segmentation for key view • Multi-view video segmentation • Experimental results • Conclusion
Overview of the proposed framework • We built a five-view camera system for views v∈{0,1,2,3,4} • To reduce the projection error and avoid extensive computational load, we select view 2 as the key viewto start the segmentation process.
Outline • Introduction • The proposed framework • Method • Segmentation for key view • Multi-view video segmentation • Experimental results • Conclusion
Automatic initial interested objects (IIOs) extraction based on saliency model (SM) • Inspired by the work in [33], more sophisticated cues such as motion and depth are combined into our topographical SM. (a) input image, (b) saliency map using depth and motion and (c) extracted IIOs. [33] W.X. Yang, K.N. Ngan ,Unsupervised multiple object segmentation of multiviewimages, Advanced Concepts for Intelligent Vision Systems Conference (2007), pp. 178–189
Multiple objects segmentation using graph cut • For individual object, we construct a sub-graph for the pixels belonging to its “Object Rectangle” • an enlarged rectangle to encompass the whole object and restricts the segmentation region • we convert multiple objects segmentation into several sub-segmentation problems
Objects segmentation by using graph cut • Graph cut • The general formulation of energy function: smoothness term Data term
Basic energy function • Data term • evaluate the likelihood of a certain pixelp assigned to the label fp • color (RGB) and depth information are combined is the color distributions modeled by the Gaussian Mixture Model (GMM) is the depth modeled by the histogram mode g(·) denotes a Gaussian probabilitydistribution h(·) is the histogram mode w(·) is the mixture weighting coefficient is GMM component variable ={d, r, g, b} is a four-dimensional feature vector for pixel p
Basic energy function • Smoothness term • Ep,q(fp,fq) measures the penalty of two neighboring pixelsp andq with different labels • dist(p,q) is the coordinate distance betweenp andq • diff(cp,cq)is the average RGB color difference between p andq • βr=(2〈‖(rp-rq)2‖〉)-1, where 〈·〉 is the expectation operator for the red channel.
The result with basic energy function basic energy function using: (a) color, (b) depth, (c) combined color and depth
The segmentation errors in the rectangles • errors occur because their color and depth information are very similar to the foreground data
Background penalty with occlusion reasoning(1/2) • Since we capture the same scene at different view points, occluded background regions often occur around the object boundary. • the occluded regions have a higher probability to be the background than the visible ones. • impose a background penalty factor αbp=3.5
Background penalty with occlusion reasoning(2/2) background probability map without occlusion penalty, (b) combined occlusion map (c) background probability map with occlusion penalty.
The erroneous segmentations marked as ellipses • errors are mainly caused by the strong color contrast in the background comparing to the weak contrast across the “true” object boundary
Foreground contrast enhancement(1/3) • To make the color contrast representation more efficient • the average color difference is computed in the perceptually uniform L *a *b color space • To enhance the contrast across foreground/background boundary and attenuate the background contrast • adopting the motion residual information
Foreground contrast enhancement(2/3) • The motion residual is defined as • the smoothness term combines the L *a *b colorand motion residual contrasts is the reconstructed image from and the motion field are the motion residual of p and q
Foreground contrast enhancement(3/3) • But combining the color contrast and motion residual contrasts will not only attenuate the background contrast but also weaken the ‘‘true” foreground contrast
Foreground contrast enhancement(3/3) • we define a local color contrast to enhance the discontinuity distribution in its neighborhood • calculate the local mean μ and the local variance δ of contrast • = [26] J. Wang, P. Bhat, R.A. Colburn, M. Agrawala, M.F. Cohen, Interactive video cutout, ACM Transactions on Graphics 24 (2005) 585–594.
Outline • Introduction • The proposed framework • Method • Segmentation for key view • Multi-view video segmentation • Experimental results • Conclusion
Multi-view video segmentation • projected by pixel-based disparity compensation • exploits the spatial consistency • projected by pixel-based motion compensation from the mask of its previous frame • enforces the temporal consistency
Uncertain boundary band validation • To improve the segmentation results, we construct an uncertain band along the object boundary based on an activity measure • using our graph cut algorithm to yield more accurate segmentation layers (a) (b) prediction mask of view 3 (b) uncertain band based on the post-processing
Outline • Introduction • The proposed framework • Method • Segmentation for key view • Multi-view video segmentation • Experimental results • Conclusion
Experimental results • five-view camera system • resolution of 640 * 480 at frame rate of 30 frames per second (fps) • we demonstrated on two types of multi-view videos simulating different scenarios • similar and low depth • different depths
Comparison with others’ methods(1/2) • compare with Kolmogorov’s bilayer segmentation algorithm [21] by using their test images (a) left view, (b) right view, (c) result by our proposed algorithm (d) result by Kolmogorov’s algorithm. [21] V. Kolmogorov, A. Criminisi, A. Blake, G. Cross, C. Rother, Probabilistic fusion of stereo with color and contrast for bi-layer segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence 28 (9) (2006) 1480–1492.
Comparison with others’ methods(2/2) • compare our proposed algorithm with an existing method employing multi-way cut with α-expansion [33] using our test images [33] W.X. Yang, K.N. Ngan ,Unsupervised multiple object segmentation of multiview images, Advanced Concepts for Intelligent Vision Systems Conference (2007), pp. 178–189
Outline • Introduction • The proposed framework • Method • Segmentation for key view • Multi-view video segmentation • Experimental results • Conclusion
Conclusion • In this paper, we propose an automatic segmentation algorithm for multiple objects from multi-view video. • The experiment was implemented on two representative multi-view videos. • Accurate segmentation results with good visual quality and subjective comparison with others’ methods attest to the efficiency and robustness of our proposed algorithm.