1 / 40

EM Algorithm and its Applications

EM Algorithm and its Applications. slides adapted from Yi Li, WU. Outline. Introduction of EM K-Means  EM EM Applications Image Segmentation using EM in Clustering Algorithms. Clustering. How can we separate unsupervised the data into different classes?. Example. Clustering.

huffmane
Download Presentation

EM Algorithm and its Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EM Algorithm and its Applications slides adapted from Yi Li, WU

  2. Outline • Introduction of EM K-Means  EM • EM Applications • Image Segmentation using EM in Clustering Algorithms

  3. Clustering How can we separate unsupervised the data into different classes?

  4. Example

  5. Clustering • There are K clusters C1,…, CK with means m1,…, mK. • The least-squares error is defined as • Out of all possible partitions into K clusters, • choose the one that minimizes D. K 2 D =   || xi - mk || . k=1 xi Ck Why don’t we just do this? If we could, would we get meaningful objects?

  6. Color Clustering by K-means Algorithm • Form K-means clusters from a set of n-dimensional vectors • 1. Set ic (iteration count) to 1 • 2. Choose randomly a set of K means m1(1), …, mK(1). • 3. For each vector xi, compute distance D(xi,mk(ic)), k=1,…K • and assign xi to the cluster Cj with nearest mean. • 4. Update the means to get m1(ic),…,mK(ic); increment ic by 1 • 5. Repeat steps 3 and 4 until Ck(ic) = Ck(ic+1) for all k.

  7. x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} … K-Means Classifier Classification Results x1C(x1) x2C(x2) … xiC(xi) … Classifier (K-Means) Cluster Parameters m1 for C1 m2 for C2 … mk for Ck

  8. K-Means Classifier (Cont.) Input (Known) Output (Unknown) x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} … Classification Results x1C(x1) x2C(x2) … xiC(xi) … Cluster Parameters m1 for C1 m2 for C2 … mk for Ck

  9. Output (Unknown) Input (Known) Initial Guess of Cluster Parameters m1 , m2 , …, mk x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} … Classification Results (1) C(x1), C(x2), …, C(xi) Cluster Parameters(1) m1 , m2 , …, mk Classification Results (2) C(x1), C(x2), …, C(xi) Cluster Parameters(2) m1 , m2 , …, mk   Classification Results (ic) C(x1), C(x2), …, C(xi) Cluster Parameters(ic) m1 , m2 , …, mk  

  10. Boot Step: Initialize K clusters: C1, …, CK Each Cluster is represented by its mean mj Iteration Step: Estimate the cluster of each data Re-estimate the cluster parameters xi C(xi) K-Means (Cont.)

  11. K-Means Example

  12. K-Means Example

  13. Image Segmentation by K-Means • Select a value of K • Select a feature vector for every pixel (color, texture, position, or combination of these etc.) • Define a similarity measure between feature vectors (Usually Euclidean Distance). • Apply K-Means Algorithm. • Apply Connected Components Algorithm. • Merge any components of size less than some threshold to an adjacent component that is most similar to it. * From Marc Pollefeys COMP 256 2003

  14. Results of K-Means Clustering: Image Clusters on intensity Clusters on color K-means clustering using intensity alone and color alone * From Marc Pollefeys COMP 256 2003

  15. K means: Challenges • Will converge • But not to the global minimum of objective function • Variations: search for appropriate number of clusters by applying k-means with different k and comparing the results

  16. K-means Variants • Different ways to initialize the means • Different stopping criteria • Dynamic methods for determining the right number of clusters (K) for a given image • The EM Algorithm: a probabilistic formulation of K-means

  17. E M algorithm • Like K-means with soft assignment. (vs hard assignment) • Assign point partly to all clusters based on probability it belongs to each. • Compute weighted averages (centers, and covariances)

  18. K-means vs. EM K-means EM Cluster mean mean, variance, Representation and weight Cluster randomly select initialize K Initialization K means Gaussian distributions Expectation assign each point soft-assign each point to closest mean to each distribution Maximization compute means compute new params of current clusters of each distribution

  19. Notation: Normal distribution 1D case N( , ) is a 1D normal (Gaussian) distribution with mean  and standard deviation  (so the variance is 2.

  20. Normal distribution: Multi-dimensional case N(, ) is a multivariate Gaussian distribution with mean  and covariance matrix . What is a covariance matrix? R G B R R2 RG RB G GR G2 GB B BR BG B2 variance(X): X2 =  (xi - )2 (1/N) cov(X,Y) =  (xi - x)(yi - y) (1/N)

  21. K –Means -> EM with GMM : The Intuition (1) Instead of making a “hard” decision on to which class a pixel belongs to we use probability theory and assign pixels to classes probabilistically. Using Bayes rule We want to maximize the Posterior Likelihood Prior Posterior

  22. K-Means -> EM : The Intuition (2) • We model the Likelihood as a Gaussian/Normal distribution

  23. K-Means-> EM: The Intuition (3) • Instead of modeling the color distribution with one Gaussian, we model it as a weighted sum of Gaussians • We want to find the parameters to maximize the correctness of • We use the general trick of maximizing the logarithm of the probability function and maximize that. This is called Maximum Likelihood Estimation (MLE) • We solve the minimization in 2 steps (E and M)

  24. Boot Step: Initialize K clusters: C1, …, CK Iteration Step: Estimate the (soft) cluster assignment of each data Re-estimate the cluster parameters For each cluster j K-Means  EM with Gaussian Mixture Models (GMM) (j,j) and P(Cj)for each cluster j. Expectation Maximization

  25. x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} … EM Classifier Classification Results p(C1|x1) p(Cj|x2) … p(Cj|xi) … Classifier (EM) Cluster Parameters (1,1),p(C1) for C1 (2,2),p(C2) for C2 … (k,k),p(Ck) for Ck

  26. EM Classifier (Cont.) Input (Known) Output (Unknown) x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} … Cluster Parameters (1,1), p(C1) for C1 (2,2), p(C2) for C2 … (k,k), p(Ck) for Ck Classification Results p(C1|x1) p(Cj|x2) … p(Cj|xi) …

  27. Expectation Step Input (Known) Input (Estimation) Output x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} … Cluster Parameters (1,1), p(C1) for C1 (2,2), p(C2) for C2 … (k,k), p(Ck) for Ck Classification Results p(C1|x1) p(Cj|x2) … p(Cj|xi) … +

  28. Maximization Step Input (Known) Input (Estimation) Output x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} … Classification Results p(C1|x1) p(Cj|x2) … p(Cj|xi) … Cluster Parameters (1,1), p(C1) for C1 (2,2), p(C2) for C2 … (k,k), p(Ck) for Ck +

  29. Boot Step: Initialize K clusters: C1, …, CK Iteration Step: Expectation Step Maximization Step EM Algorithm (j,j) and P(Cj)for each cluster j.

  30. EM Demo Video of Image Classification with EM Online Tutorial

  31. EM Applications • Blobworld: Image Segmentation Using Expectation-Maximization and its Application to Image Querying

  32. Image Segmentation using EM • Step 1: Feature Extraction • Step 2: Image Segmentation using EM • In the literature often EM is used to segment the image into multiple regions. Usually each region corresponds to a gaussian model • In our Project we segment one foreground region from the background. In the training, we crop the foreground pixels and train a Gaussian Mixture Model on those pixels. In the testing, we evaluate the cond. probability of each pixel and we threshold on that value.

  33. The feature vector for pixel i is called xi. There are going to be K Gaussian models; K is given. The j-th model has a Gaussian distribution with parameters j=(j,j). j's are the weights (which sum to 1) of Gaussians.  is the collection of parameters:  =(1, …, k, 1, …, k) Symbols

  34. Each of the K Gaussians will have parameters j=(j,j), where j is the mean of the j-th Gaussian. j is the covariance matrix of the j-th Gaussian. The covariance matrices are initialized to be the identity matrix. The means can be initialized by finding the average feature vectors in each of K windows in the image; this is a data-driven initialization. Initialization

  35. E-Step (referred to as responsibilities)

  36. M-Step

  37. Sample Results for Scene Segmentation

  38. Finding good feature vectors

  39. Fitting Gaussians in RGB space

  40. Fitting Gaussians in other spaces Experiment with other color spaces. Experiment with dimensionality reduction (PCA)

More Related