410 likes | 427 Views
EM Algorithm and its Applications. slides adapted from Yi Li, WU. Outline. Introduction of EM K-Means EM EM Applications Image Segmentation using EM in Clustering Algorithms. Clustering. How can we separate unsupervised the data into different classes?. Example. Clustering.
E N D
EM Algorithm and its Applications slides adapted from Yi Li, WU
Outline • Introduction of EM K-Means EM • EM Applications • Image Segmentation using EM in Clustering Algorithms
Clustering How can we separate unsupervised the data into different classes?
Clustering • There are K clusters C1,…, CK with means m1,…, mK. • The least-squares error is defined as • Out of all possible partitions into K clusters, • choose the one that minimizes D. K 2 D = || xi - mk || . k=1 xi Ck Why don’t we just do this? If we could, would we get meaningful objects?
Color Clustering by K-means Algorithm • Form K-means clusters from a set of n-dimensional vectors • 1. Set ic (iteration count) to 1 • 2. Choose randomly a set of K means m1(1), …, mK(1). • 3. For each vector xi, compute distance D(xi,mk(ic)), k=1,…K • and assign xi to the cluster Cj with nearest mean. • 4. Update the means to get m1(ic),…,mK(ic); increment ic by 1 • 5. Repeat steps 3 and 4 until Ck(ic) = Ck(ic+1) for all k.
x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} … K-Means Classifier Classification Results x1C(x1) x2C(x2) … xiC(xi) … Classifier (K-Means) Cluster Parameters m1 for C1 m2 for C2 … mk for Ck
K-Means Classifier (Cont.) Input (Known) Output (Unknown) x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} … Classification Results x1C(x1) x2C(x2) … xiC(xi) … Cluster Parameters m1 for C1 m2 for C2 … mk for Ck
Output (Unknown) Input (Known) Initial Guess of Cluster Parameters m1 , m2 , …, mk x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} … Classification Results (1) C(x1), C(x2), …, C(xi) Cluster Parameters(1) m1 , m2 , …, mk Classification Results (2) C(x1), C(x2), …, C(xi) Cluster Parameters(2) m1 , m2 , …, mk Classification Results (ic) C(x1), C(x2), …, C(xi) Cluster Parameters(ic) m1 , m2 , …, mk
Boot Step: Initialize K clusters: C1, …, CK Each Cluster is represented by its mean mj Iteration Step: Estimate the cluster of each data Re-estimate the cluster parameters xi C(xi) K-Means (Cont.)
Image Segmentation by K-Means • Select a value of K • Select a feature vector for every pixel (color, texture, position, or combination of these etc.) • Define a similarity measure between feature vectors (Usually Euclidean Distance). • Apply K-Means Algorithm. • Apply Connected Components Algorithm. • Merge any components of size less than some threshold to an adjacent component that is most similar to it. * From Marc Pollefeys COMP 256 2003
Results of K-Means Clustering: Image Clusters on intensity Clusters on color K-means clustering using intensity alone and color alone * From Marc Pollefeys COMP 256 2003
K means: Challenges • Will converge • But not to the global minimum of objective function • Variations: search for appropriate number of clusters by applying k-means with different k and comparing the results
K-means Variants • Different ways to initialize the means • Different stopping criteria • Dynamic methods for determining the right number of clusters (K) for a given image • The EM Algorithm: a probabilistic formulation of K-means
E M algorithm • Like K-means with soft assignment. (vs hard assignment) • Assign point partly to all clusters based on probability it belongs to each. • Compute weighted averages (centers, and covariances)
K-means vs. EM K-means EM Cluster mean mean, variance, Representation and weight Cluster randomly select initialize K Initialization K means Gaussian distributions Expectation assign each point soft-assign each point to closest mean to each distribution Maximization compute means compute new params of current clusters of each distribution
Notation: Normal distribution 1D case N( , ) is a 1D normal (Gaussian) distribution with mean and standard deviation (so the variance is 2.
Normal distribution: Multi-dimensional case N(, ) is a multivariate Gaussian distribution with mean and covariance matrix . What is a covariance matrix? R G B R R2 RG RB G GR G2 GB B BR BG B2 variance(X): X2 = (xi - )2 (1/N) cov(X,Y) = (xi - x)(yi - y) (1/N)
K –Means -> EM with GMM : The Intuition (1) Instead of making a “hard” decision on to which class a pixel belongs to we use probability theory and assign pixels to classes probabilistically. Using Bayes rule We want to maximize the Posterior Likelihood Prior Posterior
K-Means -> EM : The Intuition (2) • We model the Likelihood as a Gaussian/Normal distribution
K-Means-> EM: The Intuition (3) • Instead of modeling the color distribution with one Gaussian, we model it as a weighted sum of Gaussians • We want to find the parameters to maximize the correctness of • We use the general trick of maximizing the logarithm of the probability function and maximize that. This is called Maximum Likelihood Estimation (MLE) • We solve the minimization in 2 steps (E and M)
Boot Step: Initialize K clusters: C1, …, CK Iteration Step: Estimate the (soft) cluster assignment of each data Re-estimate the cluster parameters For each cluster j K-Means EM with Gaussian Mixture Models (GMM) (j,j) and P(Cj)for each cluster j. Expectation Maximization
x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} … EM Classifier Classification Results p(C1|x1) p(Cj|x2) … p(Cj|xi) … Classifier (EM) Cluster Parameters (1,1),p(C1) for C1 (2,2),p(C2) for C2 … (k,k),p(Ck) for Ck
EM Classifier (Cont.) Input (Known) Output (Unknown) x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} … Cluster Parameters (1,1), p(C1) for C1 (2,2), p(C2) for C2 … (k,k), p(Ck) for Ck Classification Results p(C1|x1) p(Cj|x2) … p(Cj|xi) …
Expectation Step Input (Known) Input (Estimation) Output x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} … Cluster Parameters (1,1), p(C1) for C1 (2,2), p(C2) for C2 … (k,k), p(Ck) for Ck Classification Results p(C1|x1) p(Cj|x2) … p(Cj|xi) … +
Maximization Step Input (Known) Input (Estimation) Output x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} … Classification Results p(C1|x1) p(Cj|x2) … p(Cj|xi) … Cluster Parameters (1,1), p(C1) for C1 (2,2), p(C2) for C2 … (k,k), p(Ck) for Ck +
Boot Step: Initialize K clusters: C1, …, CK Iteration Step: Expectation Step Maximization Step EM Algorithm (j,j) and P(Cj)for each cluster j.
EM Demo Video of Image Classification with EM Online Tutorial
EM Applications • Blobworld: Image Segmentation Using Expectation-Maximization and its Application to Image Querying
Image Segmentation using EM • Step 1: Feature Extraction • Step 2: Image Segmentation using EM • In the literature often EM is used to segment the image into multiple regions. Usually each region corresponds to a gaussian model • In our Project we segment one foreground region from the background. In the training, we crop the foreground pixels and train a Gaussian Mixture Model on those pixels. In the testing, we evaluate the cond. probability of each pixel and we threshold on that value.
The feature vector for pixel i is called xi. There are going to be K Gaussian models; K is given. The j-th model has a Gaussian distribution with parameters j=(j,j). j's are the weights (which sum to 1) of Gaussians. is the collection of parameters: =(1, …, k, 1, …, k) Symbols
Each of the K Gaussians will have parameters j=(j,j), where j is the mean of the j-th Gaussian. j is the covariance matrix of the j-th Gaussian. The covariance matrices are initialized to be the identity matrix. The means can be initialized by finding the average feature vectors in each of K windows in the image; this is a data-driven initialization. Initialization
E-Step (referred to as responsibilities)
Fitting Gaussians in other spaces Experiment with other color spaces. Experiment with dimensionality reduction (PCA)