Utilization of 3D Scene Data for Improving Segmentation and Tracking

Utilization of 3D Scene Data for Improving Segmentation and Tracking Yingdong Ma Supervisors: Dr. Stewart Worrall and Prof. Ahmet Kondoz

Outline • Introduction • Surface Reconstruction from Unorganised Sample Points • 3D Object Segmentation • Depth Assisted Video Object Segmentation • Depth Assisted Video Object Tracking • Conclusion and Future Works • List of Publications

Introduction • Motivation: • Automatic video object detection and tracking is a hard work for current mono-view video processing systems • The ability of handling new input, such as the 3D scene data

Introduction • Reconstruction of 3D scene further benefits video content detection • Possible application: NASA’s Mars pathfinder mission • Decomposition of 3D object model reveals structual informaiton (Image from NASA Mars pathfinder web page) 4

Introduction • Aim: development of techniques for better utilization of 3D scene data in semantic object segmentation and tracking • Objectives: • 3D object surface reconstruction • 3D object segmentation • Depth assisted video object segmentation • Depth assisted multiple objects tracking

3D Object Surface Reconstruction • Given a point cloud on or near the object surface, recovers the geometric shape from the point cloud and represent it by means of a polygon mesh • Challenges: • The sample point might be very large, noisy, arbitrary shape • Local under sampling, where point samples are not dense enough to capture local features

3D Object Surface Reconstruction • Typical reconstruction approaches overview • Functional approaches: a implicit surface is defined as the zero-set of a scalar function, such as weighted quadrics. • Voronoi/Delaunay filtering: the surface mesh is obtained by extracting triangles from a Delaunay triangulation/Voronoi diagram • Region growing: starts from an initial triangle and iterates to attach new triangles to the region’s boundaries.

3D Object Surface Reconstruction • Delaunay/Voronoi-based methods • Provide geometric structure information for unorganized sample points • Multiple Delaunay/Voronoi computation • Extra points introduced (poles) • Region growing methods • Computationally efficient • Reconstruction quality depends on user defined parameters

3D Object Surface Reconstruction • Delaunay-based region growing method • Build Delaunay triangulation • Smallest top triangle • Select candidate triangle for each boundary edge • 2D example

3D Object Surface Reconstruction • Selection of candidate triangle • The confident order of a candidate triangle is defined based on the angular distance and the geometric distance

3D Object Surface Reconstruction • Experimental results (number of triangles selected so far) 10040f 5000f 200f 1000f 1000f 5000f 200f 14785f

3D Object Surface Reconstruction • Performance analysis • Efficient (compute Delaunay triangulation once) • The code is written in C on a laptop computer (1400 MHz, 512Mb memory) • Small holes filling by greedy triangulation • Robust in the case of sharp boundary if the point cloud is dense enough

3D Object Segmentation • Given a 3D object, which is represented by a set of sample points or a surface mesh, the 3D object segmentation problem refers to partition the object into meaningful parts based on geometric properties of the sample points or faces that comprise the object • Properly decomposing objects into meaningful components recovers useful structural properties of a 3D object • 3D object segmentation is an ill-posed problem: segmentation results highly depends on the application context

3D Object Segmentation • Approaches overview • The fuzzy k-means clustering method computes the probability that a point/face belongs to a patch based on a distance measurement • Number of components • Hierarchical binary decomposition method • Skeleton-based method represents objects by a lower dimensional construction • Skeleton generation: principal axis, edge contraction • Skeleton segmentation or sweep skeleton branches

3D Object Segmentation • Aim: given a 3D object (point cloud), automatically divide the object into several meaningful components • System overview • Using a set of local maxima and a fuzzy cluster validity index to find the number of components • Decompose object by means of fuzzy k-means • Find the cut line between components by Max flow/Min cut

3D Object Segmentation • Find local maxima • The distance measurement: the shortest path between two vertices • The root vertex is the one which has the longest geodesic distance to other vertices • A vertex is labelled as local maximum if its neighbours closer to the root vertex • Fuzzy cluster validity index (the separation index) • Order local maximum based on their geodesic distance • The smallest S(f) indicates the optimal number of components Separation index S(f) Local maxima

3D Object Segmentation • Object segmentation by the fuzzy k-means method • Minimization of the object function • Find the cut line between components • Give large weight to an edge if the angle between its two end points normal is negative (concave edge)

3D Object Segmentation • Experimental results Separation index S(f) Local maxima

3D Object Segmentation • Performance evaluation • automatic calculation of the optimal number of meaningful components • The MDS (multi-dimensional scaling) transform method • Unfold object model and construct the convex hull of transformed model • Critical points: Local maxima on the convex hull

Depth Assisted Object Segmentation • A process that involves partitioning a video scene into semantic meaningful components in a generic video sequence • Most vision-based systems involving video object tracking and moving objects recognition require fast and reliable detection of foreground regions • Performance of the video object segmentation techniques can be effected by reasons such as shadows, illumination change, cluttered background, and background movement

Depth Assisted Object Segmentation • Object segmentation approaches overview • Spatial segmentation • Efficient in the case of simple scene but tends to get over-segmentation under cluttered background • Post-processing is required such as small regions merging • Motion-based segmentation • Frame difference method is fast but sensitive to shadows and background movement

Depth Assisted Object Segmentation • Joint spatial-temporal segmentation takes into account motion information and various spatial features including colour, texture and edge • Depth-based segmentation • Robust in complex backgrounds but sensitive to object texture and the distance between objects and the camera

Depth Assisted Object Segmentation • Mono-view videos lost some important information of the scene, such as the depth information • Develop an automatic segmentation method to extract meaningful video objects by combining depth and other spatial-temporal features • System overview

Depth Assisted Object Segmentation • Object mask generation • Depth map segmentation • u/v-projection images • Motion segmentation • Frame difference images • Object boundary refinement by active contour model • Object masks provide the initial contour • The intensity and edge are the external energies to be minimized in active contour model

Depth Assisted Object Segmentation • Experimental results • Depth map, motion-based segmentation, without boundary refinement • Depth map, motion, active contour model based segmentation

Depth Assisted Object Segmentation • Performance evaluation • Object segmentation by the proposed method and the background subtraction method • How much relevant object pixels the proposed method has extracted

Depth Assisted Video Object Tracking • Locating and assigning consistent labels to the tracked objects in a generic video sequence • Target locating: predict the location of interesting objects being tracked in the next frame • Object matching: establish correspondence of detected objects across frames • Accurate and robust multiple object tracking is a challenging problem • complex backgrounds, arbitrary object motion, changing appearance patterns of non-rigid objects, and partial or full object overlaps

Depth Assisted Video Object Tracking • Region-based tracking • Video objects are segmented into a set of small regions • Object identification is established based on these regions • Kernel-based tracking • The kernel is an object region, either a simple shape with associated colour histogram or a small area roughly represents object shape • Feature-based tracking • Images elements, including edges, colour, texture, motion vector, object contour, etc. • Object identification is established in the feature space, such as the colour histogram

Depth Assisted Video Object Tracking • Develop a depth assisted solution of multiple objects tracking under various type of overlaps • A stereo-vision system has the ability to separate objects at different depth layers under partial overlap • System overview: • Stable object segmentation from cluttered backgrounds • Depth assisted overlap detection • Depth-based partial overlap handling • Severe/full overlap handling

Depth Assisted Video Object Tracking • Depth assisted overlap detection • Detect the occurrence and the end of partial overlap based on the depth map segmentation and object’s overlap situation in the previous frame • Overlap handling • Different object tracking techniques are employed according to various overlap situations

Depth Assisted Video Object Tracking • Tracking non-overlaid objects: the shortest three-dimensional Euclidean distance • Tracking partial overlaid objects in different disparity layers: colour-based silhouette matching • Tracking partial overlaid objects in one disparity layer: iterative silhouette matching algorithm • Overlaid objects are separated based on their average disparity range in the previous frame

Depth Assisted Video Object Tracking • Severe/full overlap handling: • Unmatched foreground region can be a new object or a splitting object • Local best matching • Splitting object: foreground region matches one of the overlaid object

Depth Assisted Video Object Tracking • Experimental results • Object tracking under partial overlap in different disparity layers • Object tracking under partial and severe overlap • Tracking system can rematch objects after full overlap

Depth Assisted Video Object Tracking • Performance evaluation • Object tracking by template matching • works well in the case of non-overlap and partial overlap due to the updating of template but failed under severe overlap • Object tracking by mean-shift • Lost of severe overlaid object and mismatches the splitting object

Depth Assisted Video Object Tracking • Performance evaluation

Conclusions • A Delaunay-based region growing method is developed to reconstruct 3D object surface from a set of sample points • The new candidate triangle selection criterion ensures the region growing process smooth and robust at sharp boundaries • The 3D object segmentation algorithm divides object into several meaningful components by means of the fuzzy k-means method • A fuzzy cluster validity index is used to find the optimal number of components from a set of local maxima • Depth assisted video object segmentation • A depth-based segmentation framework is introduced, which consists of a depth and motion based object mask generation step and an object boundarie refinement step to extract semantic object regions • Depth assisted video object tracking • The overlap detection method is based on the depth map segmentation and the overlap situation of each track in the previous frame • Different object tracking strategies are employed according to the various overlap situations

Future Works • Non-uniform point cloud simplification: remove redundant sample points adaptively according to the surface curvature • Combination of depth map segmentation and motion segmentation • A quality measurement is needed to evaluate the quality of depth map and motion segmentation • Assign a probability measurement to each associated object according to the matching result so that the tracking system can recover from failed object association in the previous frame

List of Publications • Y. Ma, S. Worrall, and A. M. Kondoz, “Depth Assisted Occlusion Handling in Video Object Tracking,” Signal Processing: Image Communication, Elsevier Science (under review) • Y. Ma S. Worrall, and A. M. Kondoz, “3D Point Segmentation Using Critical Points and Fuzzy Clustering,” in Proc. the 4th IET Conference on Visual Information Engineering, London, 2007 • Y. Ma S. Worrall, and A. M. Kondoz, “Automatic Video Object Segmentation Using Depth Information and an Active Contour Model,” in Proc. IEEE International Workshop on Multimedia Signal Processing, Cairns, Queensland, Australia, 2008 • Y. Ma S. Worrall, and A. M. Kondoz, “Video Object Segmentation in Cluttered Background Using Depth and Spatial-temporal information,” in Proc. 3rd International Workshop on Hybrid Artificial Intelligence Systems, Burgos, Spain, 2008 • Y. Ma S. Worrall, and A. M. Kondoz, “Depth Assisted Visual Tracking,” in Proc. 10th IEEE International Workshop on Image Analysis for Multimedia Interactive Services, London, 2009

Thank You!.. Any questions? Contacts: • Yingdong Ma Yingdong.ma@surrey.ac.uk

Utilization of 3D Scene Data for Improving Segmentation and Tracking