EM Algorithm and its Applications

EM Algorithm and its Applications slides adapted from Yi Li, WU

Outline • Introduction of EM K-Means  EM • EM Applications • Image Segmentation using EM in Clustering Algorithms

Clustering How can we separate unsupervised the data into different classes?

Example

Clustering • There are K clusters C1,…, CK with means m1,…, mK. • The least-squares error is defined as • Out of all possible partitions into K clusters, • choose the one that minimizes D. K 2 D =   || xi - mk || . k=1 xi Ck Why don’t we just do this? If we could, would we get meaningful objects?

Color Clustering by K-means Algorithm • Form K-means clusters from a set of n-dimensional vectors • 1. Set ic (iteration count) to 1 • 2. Choose randomly a set of K means m1(1), …, mK(1). • 3. For each vector xi, compute distance D(xi,mk(ic)), k=1,…K • and assign xi to the cluster Cj with nearest mean. • 4. Update the means to get m1(ic),…,mK(ic); increment ic by 1 • 5. Repeat steps 3 and 4 until Ck(ic) = Ck(ic+1) for all k.

x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} … K-Means Classifier Classification Results x1C(x1) x2C(x2) … xiC(xi) … Classifier (K-Means) Cluster Parameters m1 for C1 m2 for C2 … mk for Ck

K-Means Classifier (Cont.) Input (Known) Output (Unknown) x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} … Classification Results x1C(x1) x2C(x2) … xiC(xi) … Cluster Parameters m1 for C1 m2 for C2 … mk for Ck

Output (Unknown) Input (Known) Initial Guess of Cluster Parameters m1 , m2 , …, mk x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} … Classification Results (1) C(x1), C(x2), …, C(xi) Cluster Parameters(1) m1 , m2 , …, mk Classification Results (2) C(x1), C(x2), …, C(xi) Cluster Parameters(2) m1 , m2 , …, mk   Classification Results (ic) C(x1), C(x2), …, C(xi) Cluster Parameters(ic) m1 , m2 , …, mk  

Boot Step: Initialize K clusters: C1, …, CK Each Cluster is represented by its mean mj Iteration Step: Estimate the cluster of each data Re-estimate the cluster parameters xi C(xi) K-Means (Cont.)

K-Means Example

Image Segmentation by K-Means • Select a value of K • Select a feature vector for every pixel (color, texture, position, or combination of these etc.) • Define a similarity measure between feature vectors (Usually Euclidean Distance). • Apply K-Means Algorithm. • Apply Connected Components Algorithm. • Merge any components of size less than some threshold to an adjacent component that is most similar to it. * From Marc Pollefeys COMP 256 2003

Results of K-Means Clustering: Image Clusters on intensity Clusters on color K-means clustering using intensity alone and color alone * From Marc Pollefeys COMP 256 2003

K means: Challenges • Will converge • But not to the global minimum of objective function • Variations: search for appropriate number of clusters by applying k-means with different k and comparing the results

K-means Variants • Different ways to initialize the means • Different stopping criteria • Dynamic methods for determining the right number of clusters (K) for a given image • The EM Algorithm: a probabilistic formulation of K-means

E M algorithm • Like K-means with soft assignment. (vs hard assignment) • Assign point partly to all clusters based on probability it belongs to each. • Compute weighted averages (centers, and covariances)

K-means vs. EM K-means EM Cluster mean mean, variance, Representation and weight Cluster randomly select initialize K Initialization K means Gaussian distributions Expectation assign each point soft-assign each point to closest mean to each distribution Maximization compute means compute new params of current clusters of each distribution

Notation: Normal distribution 1D case N( , ) is a 1D normal (Gaussian) distribution with mean  and standard deviation  (so the variance is 2.

Normal distribution: Multi-dimensional case N(, ) is a multivariate Gaussian distribution with mean  and covariance matrix . What is a covariance matrix? R G B R R2 RG RB G GR G2 GB B BR BG B2 variance(X): X2 =  (xi - )2 (1/N) cov(X,Y) =  (xi - x)(yi - y) (1/N)

K –Means -> EM with GMM : The Intuition (1) Instead of making a “hard” decision on to which class a pixel belongs to we use probability theory and assign pixels to classes probabilistically. Using Bayes rule We want to maximize the Posterior Likelihood Prior Posterior

K-Means -> EM : The Intuition (2) • We model the Likelihood as a Gaussian/Normal distribution

K-Means-> EM: The Intuition (3) • Instead of modeling the color distribution with one Gaussian, we model it as a weighted sum of Gaussians • We want to find the parameters to maximize the correctness of • We use the general trick of maximizing the logarithm of the probability function and maximize that. This is called Maximum Likelihood Estimation (MLE) • We solve the minimization in 2 steps (E and M)

Boot Step: Initialize K clusters: C1, …, CK Iteration Step: Estimate the (soft) cluster assignment of each data Re-estimate the cluster parameters For each cluster j K-Means  EM with Gaussian Mixture Models (GMM) (j,j) and P(Cj)for each cluster j. Expectation Maximization

x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} … EM Classifier Classification Results p(C1|x1) p(Cj|x2) … p(Cj|xi) … Classifier (EM) Cluster Parameters (1,1),p(C1) for C1 (2,2),p(C2) for C2 … (k,k),p(Ck) for Ck

EM Classifier (Cont.) Input (Known) Output (Unknown) x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} … Cluster Parameters (1,1), p(C1) for C1 (2,2), p(C2) for C2 … (k,k), p(Ck) for Ck Classification Results p(C1|x1) p(Cj|x2) … p(Cj|xi) …

Expectation Step Input (Known) Input (Estimation) Output x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} … Cluster Parameters (1,1), p(C1) for C1 (2,2), p(C2) for C2 … (k,k), p(Ck) for Ck Classification Results p(C1|x1) p(Cj|x2) … p(Cj|xi) … +

Maximization Step Input (Known) Input (Estimation) Output x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} … Classification Results p(C1|x1) p(Cj|x2) … p(Cj|xi) … Cluster Parameters (1,1), p(C1) for C1 (2,2), p(C2) for C2 … (k,k), p(Ck) for Ck +

Boot Step: Initialize K clusters: C1, …, CK Iteration Step: Expectation Step Maximization Step EM Algorithm (j,j) and P(Cj)for each cluster j.

EM Demo Video of Image Classification with EM Online Tutorial

EM Applications • Blobworld: Image Segmentation Using Expectation-Maximization and its Application to Image Querying

Image Segmentation using EM • Step 1: Feature Extraction • Step 2: Image Segmentation using EM • In the literature often EM is used to segment the image into multiple regions. Usually each region corresponds to a gaussian model • In our Project we segment one foreground region from the background. In the training, we crop the foreground pixels and train a Gaussian Mixture Model on those pixels. In the testing, we evaluate the cond. probability of each pixel and we threshold on that value.

The feature vector for pixel i is called xi. There are going to be K Gaussian models; K is given. The j-th model has a Gaussian distribution with parameters j=(j,j). j's are the weights (which sum to 1) of Gaussians.  is the collection of parameters:  =(1, …, k, 1, …, k) Symbols

Each of the K Gaussians will have parameters j=(j,j), where j is the mean of the j-th Gaussian. j is the covariance matrix of the j-th Gaussian. The covariance matrices are initialized to be the identity matrix. The means can be initialized by finding the average feature vectors in each of K windows in the image; this is a data-driven initialization. Initialization

E-Step (referred to as responsibilities)

M-Step

Sample Results for Scene Segmentation

Finding good feature vectors

Fitting Gaussians in RGB space

Fitting Gaussians in other spaces Experiment with other color spaces. Experiment with dimensionality reduction (PCA)

EM Algorithm and its Applications

EM Algorithm and its Applications

Presentation Transcript

The EM algorithm

EM algorithm

Advance in Fireworks Algorithm and its Applications

BIRCH: A New Data Clustering Algorithm and Its Applications

EM algorithm and applications

Nanobiotechnology and its Applications

Kevlar and its Applications

The EM algorithm

JVSTM and its applications

ELECTROMAGNET AND ITS APPLICATIONS

EM algorithm and applications Lecture #9

Replication and Its Applications

The EM algorithm

EM Algorithm

Hashing Algorithm and its Applications in Bioinformatics

EM algorithm and applications Lecture #9

Entanglement and its Applications

Research and Its Applications

EM Algorithm and its Applications

EM Algorithm

Elasticity and its Applications

Advance in Fireworks Algorithm and its Applications