310 likes | 414 Views
Segmentation Techniques Luis E. Tirado PhD qualifying exam presentation Northeastern University. Segmentation. Spectral Clustering Graph-cut Normalized graph-cut Expectation Maximization (EM) clustering. Segmentation. Spectral Clustering Graph-cut Normalized graph-cut
E N D
SegmentationTechniquesLuis E. TiradoPhD qualifying exam presentation Northeastern University
Segmentation • Spectral Clustering • Graph-cut • Normalized graph-cut • Expectation Maximization (EM) clustering
Segmentation • Spectral Clustering • Graph-cut • Normalized graph-cut • Expectation Maximization (EM) clustering 9/15/2014
Graph Theory Terminology A B • Graph G(V,E) • Set of vertices and edges • Numbers represent weights • Graphs for Clustering • Points are vertices • Weights reduced with distance • Segmentation: look for minimum cut in graph 9/15/2014
Spectral Clustering 5 9 4 2 6 1 8 1 1 3 7 from Forsyth & Ponce • Graph-cut • Undirected, weighted graph G = (V,E) as affinity matrix A • Use eigenvectors for segmentation • Assume k elements and c clusters • Represent cluster n with vector w of k components • Values represent cluster association; normalize so that • Extract good clusters • Select wn which maximizes • Solution is • wn is an eigenvector of A; select eigenvector with largest eigenvalue 9/15/2014
Spectral Clustering • Normalized Cut • Address drawbacks of graph-cut • Define association between vertex subset A and full set V as: • Previously maximized assoc(A,A); now also wish to minimize assoc(A,V). Define normalized cut as: 9/15/2014
Spectral Clustering • Normalized Cuts Algorithm • Define where A is affinity matrix. • Define vector x depicting cluster membership • xi = 1 if point i is in A, and -1, otherwise • Define real approximation to x: • We now wish to minimize objective function: • This constitutes solving: • Solution is eigenvector with second smallest eigenvalue • If normcut is over some threshold, re-partition graph. 9/15/2014
Probabilistic Mixture Resolving Approach to Clustering • Expectation Maximization (EM) Algorithm • Density estimation of data points in unsupervised setting • Finds ML estimates when data depends on latent variables • E step – likelihood expectation including latent variables as observed • M step – computes ML estimates of parameters by maximizing above • Start with Gaussian Mixture Model: • Segmentation: reformulate as missing data problem • Latent variable Z provides labeling • Gaussian bivariate PDF: 9/15/2014
Probabilistic Mixture Resolving Approach to Clustering • EM Process • Maximize log-likelihood function: • Not trivial; introduce Z, & denote complete data Y = [XTZT]T: • We know above data; ML is easy: 9/15/2014
Probabilistic Mixture Resolving Approach to Clustering • EM steps 9/15/2014
Conclusions • For simple case like example of four Gaussians, both algorithms perform well, as can be seen from results • From literature: (k = # of clusters) • EM is good for small k; coarse segmentation for large k • Needs to know number of components to cluster • Initial conditions are essential; prior knowledge helpful to accelerate convergence and achieving a local/global maximum of likelihood • Ncut gives good results for large k • For fully connected graph, intensive space & computation time requirements • Graph cut’s first eigenvector approach finds points in the ‘dominant’ cluster • Not very consistent; literature advocates for normalized approach • In end, tradeoff depending on source data
References (for slide images) • J. Shi & J. Malik “Normalized Cuts and Image Segmentation” • http://www.cs.berkeley.edu/~malik/papers/SM-ncut.pdf • C. Bishop “Latent Variables, Mixture Models and EM” • http://cmp.felk.cvut.cz/cmp/courses/recognition/Resources/_EM/Bishop-EM.ppt • R. Nugent & L. Stanberry “Spectral Clustering” • http://www.stat.washington.edu/wxs/Stat593-s03/Student-presentations/SpectralClustering2.ppt • S. Candemir “Graph-based Algorithms for Segmentation” • http://www.bilmuh.gyte.edu.tr/BIL629/special%20section-%20graphs/GraphBasedAlgorithmsForComputerVision.ppt • W. H. Liao “Segmentation: Graph-Theoretic Clustering” • http://www.cs.nccu.edu.tw/~whliao/acv2008/segmentation_by_graph.ppt • D. Forsyth & J. Ponce “Computer Vision: A Modern Approach”
K-means(used by some clustering algorithms) • Determine Euclidean distance of each object in data set to (randomly picked) center points • Construct K clusters by assigning all points to closest cluster • Move the center points to the real centers of the resulting clusters
Responsibilities • Responsibilities assign data points to clusterssuch that • Example: 5 data points and 3 clusters
data prototypes responsibilities K-means Cost Function
Minimizing the Cost Function • E-step: minimize w.r.t. • assigns each data point to nearest prototype • M-step: minimize w.r.t • gives • each prototype set to the mean of points in that cluster • Convergence guaranteed since there is a finite number of possible settings for the responsibilities
Limitations of K-means • Hard assignments of data points to clusters – small shift of a data point can flip it to a different cluster • Not clear how to choose the value of K – and value must be chosen beforehand. • Solution: replace ‘hard’ clustering of K-means with ‘soft’ probabilistic assignments of EM • Not robust to outliers – Far data from centroid may pull centroid away from real one.
EM Algorithm – Informal Derivation • Let us proceed by simply differentiating the log likelihood • Setting derivative with respect to equal to zero givesgivingwhich is simply the weighted mean of the data
Ng, Jordan, Weiss Algorithm • Form the matrix • Find , the k largest eigenvectors of L • These form the columns of the new matrix X • Note: have reduced dimension from nxn to nxk
Ng, Jordan, Weiss Algorithm • Form the matrix Y • Renormalize each of X’s rows to have unit length • Y • Treat each row of Y as a point in • Cluster into k clusters via K-means • Final Cluster Assignment • Assign point to cluster j iff row i of Y was assigned to cluster j
Reasoning for Ng • If we eventually use K-means, why not just apply K-means to the original data? • This method allows us to cluster non-convex regions
User’s Prerogative • Choice of k, the number of clusters • Choice of scaling factor • Realistically, search over and pick value that gives the tightest clusters • Choice of clustering method
Advantages/Disadvantages • Perona & Freeman • For block diagonal affinity matrices, the first eigenvector finds points in the “dominant” cluster; not very consistent • Shi & Malik • 2nd generalized eigenvector minimizes affinity between groups by affinity within each group; no guarantee, constraints • Ng, Jordan, Weiss • Again depends on choice of k • Claim: effectively handles clusters whose overlap or connectedness varies across clusters
Affinity Matrix Perona/Freeman Shi/Malik 1st eigenv. 2nd gen. eigenv. Affinity Matrix Perona/Freeman Shi/Malik 1st eigenv. 2nd gen. eigenv. Affinity Matrix Perona/Freeman Shi/Malik 1st eigenv. 2nd gen. eigenv.