Journal of Visual Communication and Image Representation

Multi-view video based multiple objects segmentation using graph cut and spatiotemporal projections Journal of Visual Communication and Image Representation Volume 21, Issues 5–6, July–August 2010, Pages 453–461 Qian Zhang, King NgiNgan Department of Electronic Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong Speaker : Yi-Ting Chen

Outline • Introduction • The proposed framework • Method • Segmentation for key view • Multi-view video segmentation • Experimental results • Conclusion

Introduction • Most of the interest has been focused on the research of single view segmentation. • Depth information in the 3D scene can be reconstructed from multi-view images, but multiple view segmentation has not attracted much attention. • Most of the classical and start-of-the-art graph cut based segmentation algorithms require user’s interventions to specify the initial foreground and background regions as hard constraints.

Overview of the proposed framework • We built a five-view camera system for views v∈{0,1,2,3,4} • To reduce the projection error and avoid extensive computational load, we select view 2 as the key viewto start the segmentation process.

Automatic initial interested objects (IIOs) extraction based on saliency model (SM) • Inspired by the work in [33], more sophisticated cues such as motion and depth are combined into our topographical SM. (a) input image, (b) saliency map using depth and motion and (c) extracted IIOs. [33] W.X. Yang, K.N. Ngan ,Unsupervised multiple object segmentation of multiviewimages, Advanced Concepts for Intelligent Vision Systems Conference (2007), pp. 178–189

Multiple objects segmentation using graph cut • For individual object, we construct a sub-graph for the pixels belonging to its “Object Rectangle” • an enlarged rectangle to encompass the whole object and restricts the segmentation region • we convert multiple objects segmentation into several sub-segmentation problems

Objects segmentation by using graph cut • Graph cut • The general formulation of energy function: smoothness term Data term

Basic energy function • Data term • evaluate the likelihood of a certain pixelp assigned to the label fp • color (RGB) and depth information are combined is the color distributions modeled by the Gaussian Mixture Model (GMM) is the depth modeled by the histogram mode g(·) denotes a Gaussian probabilitydistribution h(·) is the histogram mode w(·) is the mixture weighting coefficient is GMM component variable ={d, r, g, b} is a four-dimensional feature vector for pixel p

Basic energy function • Smoothness term • Ep,q(fp,fq) measures the penalty of two neighboring pixelsp andq with different labels • dist(p,q) is the coordinate distance betweenp andq • diff(cp,cq)is the average RGB color difference between p andq • βr=(2〈‖(rp-rq)2‖〉)-1, where 〈·〉 is the expectation operator for the red channel.

The result with basic energy function basic energy function using: (a) color, (b) depth, (c) combined color and depth

The segmentation errors in the rectangles • errors occur because their color and depth information are very similar to the foreground data

Background penalty with occlusion reasoning(1/2) • Since we capture the same scene at different view points, occluded background regions often occur around the object boundary. • the occluded regions have a higher probability to be the background than the visible ones. • impose a background penalty factor αbp=3.5

Background penalty with occlusion reasoning(2/2) background probability map without occlusion penalty, (b) combined occlusion map (c) background probability map with occlusion penalty.

The erroneous segmentations marked as ellipses • errors are mainly caused by the strong color contrast in the background comparing to the weak contrast across the “true” object boundary

Foreground contrast enhancement(1/3) • To make the color contrast representation more efficient • the average color difference is computed in the perceptually uniform L *a *b color space • To enhance the contrast across foreground/background boundary and attenuate the background contrast • adopting the motion residual information

Foreground contrast enhancement(2/3) • The motion residual is defined as • the smoothness term combines the L *a *b colorand motion residual contrasts is the reconstructed image from and the motion field are the motion residual of p and q

Foreground contrast enhancement(3/3) • But combining the color contrast and motion residual contrasts will not only attenuate the background contrast but also weaken the ‘‘true” foreground contrast

Foreground contrast enhancement(3/3) • we define a local color contrast to enhance the discontinuity distribution in its neighborhood • calculate the local mean μ and the local variance δ of contrast • = [26] J. Wang, P. Bhat, R.A. Colburn, M. Agrawala, M.F. Cohen, Interactive video cutout, ACM Transactions on Graphics 24 (2005) 585–594.

The result with modified energy function

Multi-view video segmentation • projected by pixel-based disparity compensation • exploits the spatial consistency • projected by pixel-based motion compensation from the mask of its previous frame • enforces the temporal consistency

Uncertain boundary band validation • To improve the segmentation results, we construct an uncertain band along the object boundary based on an activity measure • using our graph cut algorithm to yield more accurate segmentation layers (a) (b) prediction mask of view 3 (b) uncertain band based on the post-processing

Experimental results • five-view camera system • resolution of 640 * 480 at frame rate of 30 frames per second (fps) • we demonstrated on two types of multi-view videos simulating different scenarios • similar and low depth • different depths

Segmentation of IOs with similar and low depth

Segmentation of IOs with different depths

Comparison with others’ methods(1/2) • compare with Kolmogorov’s bilayer segmentation algorithm [21] by using their test images (a) left view, (b) right view, (c) result by our proposed algorithm (d) result by Kolmogorov’s algorithm. [21] V. Kolmogorov, A. Criminisi, A. Blake, G. Cross, C. Rother, Probabilistic fusion of stereo with color and contrast for bi-layer segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence 28 (9) (2006) 1480–1492.

Comparison with others’ methods(2/2) • compare our proposed algorithm with an existing method employing multi-way cut with α-expansion [33] using our test images [33] W.X. Yang, K.N. Ngan ,Unsupervised multiple object segmentation of multiview images, Advanced Concepts for Intelligent Vision Systems Conference (2007), pp. 178–189

Conclusion • In this paper, we propose an automatic segmentation algorithm for multiple objects from multi-view video. • The experiment was implemented on two representative multi-view videos. • Accurate segmentation results with good visual quality and subjective comparison with others’ methods attest to the efficiency and robustness of our proposed algorithm.

Thanks for your listening!

Journal of Visual Communication and Image Representation