490 likes | 578 Views
Motion Segmentation from Clustering of Sparse Point Features Using Spatially Constrained Mixture Models. Shrinivas Pundlik Committee members Dr. Stan Birchfield (chair) Dr. Adam Hoover Dr. Ian Walker Dr. Damon Woodard. Motion Segmentation.
E N D
Motion Segmentation from Clustering of Sparse Point Features Using Spatially Constrained Mixture Models Shrinivas Pundlik Committee members Dr. Stan Birchfield (chair) Dr. Adam Hoover Dr. Ian Walker Dr. Damon Woodard
Motion Segmentation Gestalt insight:grouping forms the basis of human perception Gestalt Laws: Factors that affect the grouping process (cues) similarity proximity continuity common motion (common fate) Motion segmentation: segmenting images based on common motion points moving together are grouped together Typically, motion segmentation uses common motion + proximity
Applications of Motion Segmentation • object detection • pedestrian detection • tracking • vehicle tracking • robotics • surveillance • image and video compression • scene reconstruction • video manipulation / editing • video matting • video annotation • motion magnification Pedestrian detection Viola et al., 2003 Vehicle tracking Kanhere et al., 2005 Video editing Criminisi et al., 2006
Algorithm Nature of Data Approach Motion Layer Estimation Expectation Maximization Sparse Features Jojic and Frey 2001 Sivic et al. 2004 Wang and Adelson 1994 Kanhere et al. 2005 Smith et al. 2004 Ayer and Sawhney 1995 Rothganger et al. 2004 Kokkinos and Maragos 2004 Jojic and Frey 2001 Graph Cuts Willis et al. 2003 Dense Motion Xiao and Shah 2005 Multi Body Factorization Willis et al. 2003 Cremers and Soatto 2005 Xiao and Shah 2005 Brox et al. 2005 Criminisi et al. 2006 Costeria and Kanade 1995 Ke and Kanade 2002 Motion + Image Cues Vidal and Sastry 2003 Yan and Pollefeys 2006 Object Level Grouping Shi and Malik 1998 Xiao and Shah 2005 Gruber and Weiss 2006 Normalized Cuts Kumar et al. 2005 Criminisi et al. 2006 Kumar et al. 2005 Sivic et al. 2004 Kanhere et al. 2005 Miscellaneous Cremers and Soatto 2005 Brox et al. 2005 Variational Methods Belief Propagation Black and Fleet 1998 Birchfield 1999 Levine and Weiss 2006 Previous Work
2. wall 1. statue 3. trees 5. biker 4. grass 6. pedestrian Challenges: Short Term • computation of motion in the scene • influence of the neighboring motion • number of objects / regions in the scene + + + + + + • initialization of motion parameters • description of complex motions (articulated human motion)
x fast medium slow threshold t time window Challenges: Long Term x fast medium slow crawling threshold t time window batch processing incremental processing • batch processing vs. incremental processing • updating the reference frame • maintain existing groups • growing existing regions • splitting • adding new groups (new objects) (deleting invisible groups)
group assignment complex models Objectives Motion Segmentation Feature Tracking long-term maintenance of groups motion computation clustering (two-frame) • motion segmentation • using sparse point features • automatically determine • the number of groups • handling dynamic sequences • real time performance • handling complex motions observed data Mixture Model Framework parameter estimation motion models affine translation Articulated Human Motion Models
Overview of the Topics • Feature Tracking: Tracking sparse point features for computation of image motion and its extension to joint feature tracking. • S. T. Birchfield and S. J. Pundlik, “Joint Tracking of Features and Edges”, CVPR, 2008. • Motion Segmentation:Clustering point features in videos based on their motion and spatial connectivity. • S. J. Pundlik and S. T. Birchfield, “Motion Segmentation at Any Speed”, BMVC 2006. • S. J. Pundlik and S. T. Birchfield, “Real Time Motion Segmentation of Sparse Feature Points at Any Speed”, IEEE Trans. on Systems, Man, and Cybernetics, 2008. • Articulated Human Motion Models:Learning human walking motion from various pose and view angles for segmentation and pose estimation (a special handling of a complex motion model) • Iris Segmentation:Texture and intensity based segmentation of non-ideal iris images . • S. J. Pundlik, D. L. Woodard and S. T. Birchfield, “Non Ideal Iris Segmentation Using Graph Cuts”, CVPR Workshop on Biometrics, 2008.
Point Features gradients input point features capturing the information content • Popular features: • Harris corner feature [Harris & Stephens 1987, Schmid et al. 2000] • Shi-Tomasi feature [Shi & Tomasi 1994] • Forstner corner feature [Forstner 1994] • Scale invariant feature transform (SIFT) [Lowe 2000] • Gradient Location and Orientation Histogram (GLOH) [Mikolajczyk and Schmid 2005] • Features from accelerated segment test (FAST) [Rosten and Drummond 2005] • Speeded up robust features (SURF) [Bay et al. 2006] • DAISY [Tola et al. 2008]
Utility of Point Features • Advantages: • highly repeatable and extensible (work for a variety of images) • efficient to compute (real time implementations available) • local methods for processing (tracking through multiple frames) tracking multiple point features = sparse optical flow sparse point feature tracks yield the image motion
Tracking Point Features : Lucas-Kanade Assume constant brightness: image pixel displacement (optic flow constraint equation) image spatial derivatives image temporal derivative Estimate the pixel displacement u = ( u, v )T by minimizing: convolution kernel Differentiating with respect to u and v, setting the derivatives to zero leads to a linear system: Gradient covariance matrix Iterate using Newton-Raphson method
2 1 3 Z = intensity y x 2 3 1 intensity y x edge feature good feature no feature unidirectional intensity variation bidirectional intensity variation low intensity variation intensity emax = 1672.44, emin = 932.4 emax = 1026.9, emin = 29.9 emax = 5.15, emin = 3.13 y x two small eigenvalues a small and a large eigenvalue two large eigenvalues Detection of Point Features Gradient covariance matrix: Good feature: > eigenvalues of Z threshold image gradients convolution kernel
Approximation leads to a sparse system: Dense Optical Flow: Horn-Schunck Horn-Schunck: find global displacement functions u(x,y) and v(x,y) by minimizing: regularization parameter data term (optical flow constraint) smoothness term Solve using Euler-Lagrange: Laplacian a constant average displacement in the neighborhood
Need for a Joint Approach Lucas-Kanade (1981) Horn-Schunck (1981) • local method (local smoothing) • pixel displacement: constant within a small neighborhood • robust under noise • produces sparse optical flow • global method (global smoothing) • pixel displacement: a smooth function over the image domain • sensitive to noise • produces dense optical flow use global smoothing to improve feature tracking use local smoothing to improve dense optical flow Combined Local-Global approach (Bruhn et al., 2004) Joint Feature Tracking
Joint Lucas-Kanade (JLK) Joint Lucas-Kanade energy functional: number of feature points expected values smoothness term (regularization) data term (optical flow constraint) Differentiating EJLK w.r.t. (u,v) gives a 2N x 2N system whose (2i-1) and (2i)th rows are given by: Sparse system is solved using Jacobi iterations
Results of JLK repetitive texture low texture
Overview of the Topics • Feature Tracking: Tracking sparse point features for computation of image motion and its extension to joint feature tracking. • S. T. Birchfield and S. J. Pundlik, “Joint Tracking of Features and Edges”, CVPR, 2008. • Motion Segmentation:Clustering point features in videos based on their motion and spatial connectivity. • S. J. Pundlik and S. T. Birchfield, “Motion Segmentation at Any Speed”, BMVC 2006. • S. J. Pundlik and S. T. Birchfield, “Real Time Motion Segmentation of Sparse Feature Points at Any Speed”, IEEE Trans. on Systems, Man, and Cybernetics, 2008. • Articulated Human Motion Models:Learning human walking motion from various pose and view angles for segmentation and pose estimation (a special handling of a complex motion model) • Iris Segmentation:Texture and intensity based segmentation of non-ideal iris images . • S. J. Pundlik, D. L. Woodard and S. T. Birchfield, “Non Ideal Iris Segmentation Using Graph Cuts”, CVPR Workshop on Biometrics, 2008.
P(Red|sample) P(sample|Red) P(Red) Mixture Models Basics sample Posterior Probability of drawing a Red sample likelihood of the sample being Red (measurement) prior probability of the Red bin how Red is the drawn sample? how big is the Red bin? 3 bins (components) probability of drawing a sample from a mixture of three bins: P(sample) = P(sample|Red)P(Red) + P(sample|Green) P(Green) + P(sample|Blue)P(Blue) Mixture Model: likelihoods and priors for all the components challenge: only available information is the drawn sample!
Mixture Model Example: GMM θ4= {μ4, σ4} θ1= {μ1, σ1} θ2= {μ2, σ2} θ3= {μ3, σ3} grayscale values Gaussian density for the jth component: ith pixel conditioned on parameters of the jth Gaussian density Parameters of a Gaussian density, θ: mean (μ) and variance (σ2)
parameter estimation class association (segmentation) Learning Mixture Models Mixture model defined as: number of components (known) observed data point (known) density parameters (unknown) mixing weights (unknown) component density Learning mixture models (parameter estimation): Estimate mixing weights and component density parameters Circular nature of the problem:
M Step: Maximize the likelihood function (parameter estimation based on segmentation) E Step: Find expectation of the likelihood function (Segmentation / label assignment) convergence: when the likelihood cannot be further maximized (when estimates do not change between successive iterations ) Expectation Maximization EM: an iterative two step algorithm for parameter estimation • Initialize: • number of components K • component density parameters θ for all components • mixing weights π • convergence criterion • repeat until convergence • E STEP • for all N data points • i. compute likelihood from the component density • ii. estimate weights, w • M STEP • b. estimate mixing weights • c. estimate component density parameters
Spatially Variant Finite Mixture Model (ML) ML-SVFMM [1] Spatially Variant Finite Mixture Model(MAP) MAP-SVFMM [1] Finite Mixture Model FMM Various Mixture Models data term (how closely the data follow the models) smoothness term (spatial interaction of the data elements) one prior for each component (mixing weights) prior distribution for each data element (label probabilities) neighbors mostly have similar labels (loose constraint) enforce spatial connectivity of labels Spatially Constrained Finite Mixture Model SCFMM Greedy EM algorithm EM algorithm • S. Sanjay-Gopal and T. Hebert, “Bayesian Pixel Classification Using Spatially Variant Finite Mixtures and Generalized EM Algorithm”, IEEE Tran. on Image Processing, 1998.
start location 2 start location 3 start location 1 Greedy-EM (Iterative Region Growing) consider a 4-connected grid • Properties of Greedy EM: • enforces spatial connectivity of labels (SCFMM) • automatically determines the number of groups • local initialization of parameters • primary user defined parameters: • inclusion criterion • minimum number of elements in a group
Between two frames, Repeat Randomly select seed feature Fit motion model to neighbors Repeat until group does not change: Discard all features except the one near the centroid Grow group by recursively including neighboring features with similar motion Update the motion model Until all features have been considered centroid centroid centroid centroid centroid original seed original seed original seed original seed original seed original seed Grouping Point Features grouping features from a single seed point
seed 2 seed 1 seed 3 Grouping Consistent Features • input: point features tracked between two frames • output: groups of point features • for N seed points • group point features • gather sets of features always grouped together consistent feature group
a b c d 2 2 a b 2 c d Grouping Consistent Features Consistency check: Features that are always grouped together, no matter the seed point seed point seed point a a b a b b c c c d d d a b c d a b c d a 1 1 a 2 1 1 1 1 1 = b b + 1 1 c 1 c 2 1 d 1 d 1 In practice, we use 7 seed points
Consistent Features: Multiple Groups Feature groups obtained for various iterations consistent feature groups
Maintaining Groups Over Time frame k frame k + n either features are regrouped 3 lost features 7 5 2 2 9 8 1 1 track features if Х2 test fails 3 3 7 4 4 7 6 5 6 5 find consistent groups 9 6 8 6 3 7 6 newly added features or multiple groups are found 5 9 8
Experimental Results freethrow mobile-calendar robots car-map statue
Videos statue sequence mobile-calendar sequence
Results Over Time Algorithm dynamically determines the number of feature groups statue mobile-calendar freethrow vehicles robots car-map
Effect of Joint Feature Tracking input standard Lucas-Kanade Joint Lucas-Kanade
Overview of the Topics • Feature Tracking: Tracking sparse point features for computation of image motion and its extension to joint feature tracking. • S. T. Birchfield and S. J. Pundlik, “Joint Tracking of Features and Edges”, CVPR, 2008. • Motion Segmentation:Clustering point features in videos based on their motion and spatial connectivity. • S. J. Pundlik and S. T. Birchfield, “Motion Segmentation at Any Speed”, BMVC 2006. • S. J. Pundlik and S. T. Birchfield, “Real Time Motion Segmentation of Sparse Feature Points at Any Speed”, IEEE Trans. on Systems, Man, and Cybernetics, 2008. • Articulated Human Motion Models:Learning human walking motion from various pose and view angles for segmentation and pose estimation (a special handling of a complex motion model) • Iris Segmentation:Texture and intensity based segmentation of non-ideal iris images . • S. J. Pundlik, D. L. Woodard and S. T. Birchfield, “Non Ideal Iris Segmentation Using Graph Cuts”, CVPR Workshop on Biometrics, 2008.
Articulated Motion Models • Purpose of human motion analysis: • pedestrian detection/surveillance • action recognition • pose estimation • Traditional approaches use: • appearance • frame differencing Theme: Sparse Motion alone captures a wealth of information • Objectives: • learn articulated human motion models • motion only, no appearance • viewpoint and scale invariant detection • varying lighting conditions (day and night time sequences) • detection in presence of camera and background motion • pose estimation
Use of Motion Capture Data Top-Down Approach train high-level descriptors (appearance or motion based) that describe articulated motion at a global level for detection Bottom-Up Approach hand motion capture (mocap) data in 3D center foot 2 foot 1 displacement of the limbs w.r.t. the body center learn the motion of individual joints from the training data and aggregate the information to detect human motion
Training 3D motion capture points angular viewpoints walking poses
Motion Descriptor spatial arrangement of the descriptor bins w.r.t. the body center Gaussian weight maps for the various means and orientations that constitute the motion descriptor views poses bin values of the motion descriptor describing human subjects from various viewpoints and pose configurations confusion matrix for 64 training descriptors
Segmentation Results front right profile angular left profile View-invariant segmentation of articulated motion using a motion descriptor Segmentation of articulated motion in a challenging sequence involving camera and background motion
Pose Estimation Results angular view right-profile view front view nighttime sequence
Overview of the Topics • Feature Tracking: Tracking sparse point features for computation of image motion and its extension to joint feature tracking. • S. T. Birchfield and S. J. Pundlik, “Joint Tracking of Features and Edges”, CVPR, 2008. • Motion Segmentation:Clustering point features in videos based on their motion and spatial connectivity. • S. J. Pundlik and S. T. Birchfield, “Motion Segmentation at Any Speed”, BMVC 2006. • S. J. Pundlik and S. T. Birchfield, “Real Time Motion Segmentation of Sparse Feature Points at Any Speed”, IEEE Trans. on Systems, Man, and Cybernetics, 2008. • Articulated Human Motion Models:Learning human walking motion from various pose and view angles for segmentation and pose estimation (a special handling of a complex motion model) • Iris Segmentation:Texture and intensity based segmentation of non-ideal iris images . • S. J. Pundlik, D. L. Woodard and S. T. Birchfield, “Non Ideal Iris Segmentation Using Graph Cuts”, CVPR Workshop on Biometrics, 2008.
Iris Image Segmentation non-ideal iris image segmentation using texture and intensity higher gradient magnitude eye pupil iris background un-textured regions textured regions lower gradient magnitude gradient magnitude non-eyelash eyelash higher density of point features input image background iris pupil lower density of point features (Four Regions) point features Coarse Texture Computation • Ideas: • local intensity variations (computed from gradient magnitude and point features) can • be used for texture representation that segments eyelash and non-eyelash regions • possible segments based on image intensity: iris, pupil and background
Input Iris Image Preprocessed input Specular Reflections - Iris Mask Iris Refinement Iris Ellipse Iris Segmentation Iris Segmentation and Recognition Iris segmentation: • Iris recognition: • unwrap and normalize the iris mask • generate iris signature from iris mask (using texture in the iris) • compare iris signature using Hamming distance
Image Segmentation Results Input Image Iris Mask Segmentation pupil iris eyelashes background
Iris Recognition Iris recognition using our segmentation algorithm West Virginia Non-Ideal Database West Virginia Off-Axis Database 1868 images 467 classes, 4 images/class 584 images 146 classes, 4 images/class
Conclusions and Future Work • Motion segmentation based on sparse feature clustering • spatially constrained mixture model and greedy EM algorithm • automatically determines number of groups • real-time performance • ability to handle long, dynamic sequences and arbitrary number of feature groups • Joint feature tracking • incorporation of neighboring feature motion • improved performance in areas of low-texture or repetitive texture • Detection of articulated motion • motion based approach for learning high-level human motion models • segment and track human motion in varying pose, scale, and lighting conditions • view invariant pose estimation • Iris segmentation • graph cuts based dense segmentation using texture and intensity • combines appearance and eye geometry • handles non-ideal iris image with occlusion, illumination changes, and eye rotation • Future Work • integration of motion segmentation, joint feature tracking, and articulated motion segmentation • dense segmentation from the sparse feature groups • handling non-rigid motions, non-textured regions, and occlusions • combining sparse feature groups, discontinuities, and image contours for a novel representation of video