Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Expectation-Maximization (EM)Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern RecognitionDr. George Bebis

Expectation-Maximization (EM) • EM is an iterative ML estimation method: • Starts with an initial estimate for θ. • Refines the current estimate iteratively to increase the likelihood of the observed data: p(D/ θ)

Expectation-Maximization (cont’d) • EM represents a general framework – works best in situations where the data is incomplete (or can be thought as being incomplete) • Some creativity is required to recognize where the EM algorithm can be used. • Standard method for estimating the parameters of Mixtures of Gaussians (MoG).

Incomplete Data • Many times, it is impossible to apply ML estimation because certain features cannot be measured directly. • The EM algorithm is ideal for problems with unobserved (missing) data. Example

Example (from Moon, 1996) s Todd Moon, The Expectation-Maximization Algorithm, IEEE Signal Processing Magazine, November 1996

Example (from Moon, 1996) (cont’d) y2 y1

EM: Main Idea • If x were available, we would estimate θusing ML: • Since only y is available, estimate θby: Maximize the expectation of ln p(Dx /θ) (with respect to the unknown variables)given Dy and current estimate θt.

EM Steps (1) Initialization (2) E-Step: Expectation (3) M-Step: Maximization (4) Repeat (2)-(3) until convergence

EM Steps (cont’d) (1) Initialization Step: initialize parameters θ0 (2) Expectation step: performed with respect to the unobserved variables, using the current estimate of parameters θt and conditioned upon the observations: • Note that If ln p(Dx / θ) is a linear function of the unobserved variables, the expectation step is equivalent to:

EM Steps (cont’d) (3)Maximization Step: provides a new estimate θt+1 of the parameters: (4) Test for Convergence: stop; otherwise, go to Step 2. if

Example (from Moon, 1996) (cont’d) s Assume a trinomial distribution: k! x1!x2!x3! x1+x2+x3=k

Example (from Moon, 1996) (cont’d) k! wherexi=(xi1,xi2,xi3)

Example (Moon, 1996) (cont’d) • Take the expected value: k!

Example (Moon, 1996) (cont’d) 2Σi = Σi • We only need to estimate:

Example (Moon, 1996) (cont’d) (see Moon’s paper, page 53)

Example (Moon, 1996) (cont’d) • Initialization: θ0 • Expectation Step: • Maximization Step: • Convergence Step: 2Σi Σi

Example (Moon, 1996) (cont’d) θt

Convergence properties of EM • The solution depends on the initial estimate θ0 • At each iteration, a value of θis computed so that the likelihood function does not decrease. • The algorithm is guaranteed to be stable(i.e., does not oscillate). • There is no guarantee that it will convergence to a global maximum.

Mixture Models • EM is the standard method for estimating the parameters of “mixture models”. Example: mixture of 2D Gaussians

Mixture Model (cont’d) π1 πk π3 π2

Mixture of 1D Gaussians - Example π2=0.2 π1=0.3 π3=0.5

Estimating the parameters of a Mixture Model • Two fundamental problems: (1) Estimate the number of mixture components K (2) Estimate mixture parameters (πk,θk), k=1,2,…,K

Mixtures of Gaussians(Chapter 10) where p(x/θk)= • In this case, θk = (μk,Σk) k k k k

Data Generation Process Using Mixtures of Gaussians π1 πk π3 π2

Estimating Mixture Parameters Using ML – not easy! • Maximize which is equivalent to: • In general, it is not possible to solve explicitly and iterative schemes must be employed. where max

Estimating Mixture Parameters Using EM: Case of Unknown Means • Assumptions Observation … but we don’t!

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Introduce hidden or unobserved variables zi

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Main steps using EM

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Expectation Step substitute

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) (ignoring πκ since they are all equal) substitute

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d)

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Expectation Step E(zik) is just the probability that xi was generated by the k-th component:

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Maximization Step where

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d)

Estimating Mixture Parameters Using EM: General Case • Estimate θk=(μk, Σk), πk forevery k • Introduce hidden variables ziagain

Estimating Mixture Parameters Using EM: General Case (cont’d)

Estimating Mixture Parameters Using EM: General Case (cont’d) • Expectation Step

Estimating Mixture Parameters Using EM: General Case (cont’d) • Maximization Step (cont’d) Use Lagrange Optimization!

Estimating Mixture Parameters Using EM: General Case (cont’d) • Maximization Step (cont’d) n

Estimating Mixture Parameters Using EM: General Case (cont’d) • Maximization Step (cont’d)

Estimating Mixture Parameters Using EM: General Case (cont’d)

Estimating the Number of Components K • Other methods are possible such as using mutual • information theory Zheng Rong Yang and Mark Zwolinski, “Mutual information theory for adaptive mixture models”, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 23, NO. 4, APRIL 2001

Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9