1 / 24

Clustering with k-means and mixture of Gaussian densities

Clustering with k-means and mixture of Gaussian densities. Jakob Verbeek December 4, 2009. Plan for this course. Introduction to machine learning Clustering techniques k-means, Gaussian mixture density Gaussian mixture density continued Parameter estimation with EM, Fisher kernels

jroxie
Download Presentation

Clustering with k-means and mixture of Gaussian densities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clustering with k-means and mixture of Gaussian densities Jakob Verbeek December 4, 2009

  2. Plan for this course • Introduction to machine learning • Clustering techniques • k-means, Gaussian mixture density • Gaussian mixture density continued • Parameter estimation with EM, Fisher kernels • Classification techniques 1 • Introduction, generative methods, semi-supervised • Classification techniques 2 • Discriminative methods, kernels • Decomposition of images • Topic models, …

  3. Clustering • Finding a group structure in the data • Data in one cluster similar to each other • Data in different clusters dissimilar • Map each data point to a discrete cluster index • “flat” methods find k groups (k known, or automatically set) • “hierarchical” methods define a tree structure over the data

  4. Hierarchical Clustering Data set is partitioned into a tree structure Top-down construction Start all data in one cluster: root node Apply “flat” clustering into k groups Recursively cluster the data in each group Bottom-up construction Start with all points in separate cluster Recursively merge “closest” clusters Distance between clusters A and B Min, max, or mean distance between x in A, and y in B

  5. Clustering example [Guillaumin, Verbeek, Schmid, ICCV 2009] Learn face similarity from training pairs labeled as same/different Cluster faces based on identity Example: picasa web albums, label face clusters

  6. Clustering example: visual words

  7. Clustering for visual vocabulary construction Clustering of local image descriptors Most often done using k-means or mixture of Gaussians Divide space of region descriptors in a collection of non-overlapping cells Recap of the image representation pipe-line Extract image regions at different locations and scales: randomly, on a regular grid, or using interest point detector Compute descriptor for each region (eg SIFT) Assign each descriptor to a cluster center Or do “soft assignment” or “multiple assignment” Make histogram for complete image Possibly separate histograms for different image regions

  8. Definition of k-means clustering Given: data set of N points xn, n=1,…,N Goal: find K cluster centers mk, k=1,…,K Clustering: assign each point to closest center Error criterion: sum of squared distances to closest cluster center for each data point

  9. Examples of k-means clustering Data uniformly sampled in R2 • Data non-uniformly sampled in R3

  10. Minimizing the error function The error function is non-differentiable due to the min operator Non-convex, i.e. there are local maxima Minimization can be done with iterative algorithm Initialize cluster centers Assign each data point to nearest center Update the cluster centers as mean of associated data If cluster centers changed: return to step 2) Return cluster centers Iterations monotonically decrease error function

  11. Iteratively minimizing the error function Introduce “latent” variable zn, with value in [1,…, K] Assignment of data point xn, to one of the clusters: zn Upper-bound on error function, without min operator Error function and bound equal for the “min” assignment Minimize the bound w.r.t. cluster centers Update the cluster centers as mean of associated data

  12. Iteratively minimizing the error function Minimization can be done with iterative algorithm Assign each data point to nearest center Construct tight bound on error function Update the cluster centers as mean of associated data Minimize bound Example of “Iterative bound optimization” EM algorithm another example 1 2

  13. Examples of k-means clustering Several iterations with two centers Error function

  14. Examples of k-means clustering Clustering RGB vectors of pixels in images Compression of image file: N x 24 bits Store RGB values of cluster centers: K x 24 bits Store cluster index of each pixel: N x log K bits 8.3% 16.7% 4.2%

  15. Clustering with Gaussian mixture density Each cluster represented by Gaussian density Center, as in k-means Covariance matrix: cluster spread around center Determinant of covariance matrix C Data dimension d Quadratic function of point x and mean m

  16. Clustering with Gaussian mixture density Mixture density is weighted sum of Gaussians Mixing weight: importance of each cluster Density has to integrate to 1, so we require

  17. Clustering with Gaussian mixture density Given: data set of N points xn, n=1,…,N Find mixture of Gaussians (MoG) that best explains data Assigns maximum likelihood to the data Assume data points are drawn independently from MoG Maximize log-likelihood of fixed data set X w.r.t. parameters of MoG As with k-means objective function has local minima Can use Expectation-Maximization (EM) algorithm Similar to the iterative k-means algorithm

  18. Assignment of data points to clusters As with k-means zn indicates cluster index for xn To sample point from MoG Select cluster index k with probability given by mixing weight Sample point from the k-th Gaussian MoG recovered if we marginalize of unknown index

  19. Soft assignment of data points to clusters Given data point xn, infer value of zn Conditional probability of zn given xn

  20. Maximum likelihood estimation of Gaussian Given data points xn, n=1,…,N Find Gaussian that maximizes data log-likelihood Set derivative of data log-likelihood w.r.t. parameters to zero Parameters set as data covariance and mean

  21. Maximum likelihood estimation of MoG Use EM algorithm Initialize MoG: parameters or soft-assign E-step: soft assign of data points to clusters M-step: update the cluster parameters Repeat EM steps, terminate if converged Convergence of parameters or assignments E-step: compute posterior on z given x: M-step: update Gaussians from data points weighted by posterior

  22. Maximum likelihood estimation of MoG Example of several EM iterations

  23. Clustering with k-means and MoG Hard assignment in k-means is not robust near border of quantization cells Soft assignment in MoG accounts for ambiguity in the assignment Both algorithms sensitive for initialization Run from several initializations Keep best result Nr of clusters need to be set Both algorithm can be generalized to other types of distances or densities Images from [Gemert et al, IEEE TPAMI, 2010]

  24. Plan for this course • Introduction to machine learning • Clustering techniques • k-means, Gaussian mixture density • Reading for next week: • Neal & Hinton “A view of the EM algorithm that justifies incremental, sparse, and other variants”, in “Learning in graphical models”,1998. • Part of chapter 3 of my thesis • Both available on course website http://lear.inrialpes.fr/~verbeek/teaching • Gaussian mixture density continued • Parameter estimation with EM, Fisher kernels • Classification techniques 1 • Introduction, generative methods, semi-supervised • Classification techniques 2 • Discriminative methods, kernels • Decomposition of images • Topic models, …

More Related