Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Expectation-Maximization (EM)Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern RecognitionDr. George Bebis

Expectation-Maximization (EM) • EM is an iterative method to perform ML estimation: • Starts with an initial estimate for θ. • Refines the current estimate iteratively to increase the likelihood of the observed data: p(D/ θ)

Expectation-Maximization (EM) • EM represents a general framework – works best in situations where the data is incomplete (or can be thought as being incomplete) • Some creativity is required to recognize where the EM algorithm can be used. • Standard method for estimating the parameters of Mixtures of Gaussians (MoG).

The Case of Incomplete Data • Many times, it is impossible to apply ML estimation because we can not measure all the features or certain feature values are missing. • The EM algorithm is ideal for problems with unobserved (missing) data.

Example (Moon, 1996) x1+x2+x3=n assume a trinomial distribution: x1!x2!x3!

Example (Moon, 1996) (cont’d)

EM: Main Idea • If x was available, we could use ML to estimate θ, i.e., • Since x is not available: Maximize the expectation of p(Dx /θ) with respect to the unknown variables given y and an estimate of θ.

EM Steps (1) Initialization (2) Expectation (3) Maximization (4) Test for convergence

EM Steps (cont’d) (1) Initialization Step: initialize the algorithm with a guess θ0 (2) Expectation step: it is performed with respect to the unobserved variables, using the current estimate of parameters and conditioned upon the observations • When ln p(Dx / θ) is a linear function of the unobserved variables, the expectation step is equivalent to:

EM Steps (cont’d) (3)Maximization Step: provides a new estimate of the parameters. (4) Test for Convergence: stop; otherwise, go to Step 2. if

Example (Moon, 1996) (cont’d) • Suppose: x1!x2!x3!

Example (Moon, 1996) (cont’d) • Take expected value: Let’s look at the M-step before completing the E-step …

Example (Moon, 1996) (cont’d) 2 • We only need to estimate: Let’s complete the E-step now …

Example (Moon, 1996) (cont’d) (see Moon’s paper, page 53, for a proof)

Example (Moon, 1996) (cont’d) • Initialization: θ0 • Expectation Step: • Maximization Step: • Convergence Step: 2

Example (Moon, 1996) (cont’d) θt

Convergence properties of EM • The solution depends on the initial estimate θ0 • At each iteration, a value of θis computed so that the likelihood function does not decrease. • There is no guarantee that it will convergence to a global maximum. • The algorithm is guaranteed to be stable. • i.e., there is no chance of "overshooting" or diverging from the maximum.

Mixture of 2D Gaussians - Example

Mixture of 1D Gaussians - Example π2=0.2 π1=0.3 π3=0.5

Mixture Model π1 πk π3 π2

Mixture Parameters

Fitting a Mixture Model toa set of observations Dx • Two fundamental issues: (1) Estimate the number of mixture components K (2) Estimate mixture parameters (πk ,θk), k=1,2,…,K

Mixtures of Gaussians(see Chapter 10) where each p(x/θ)= • The parameters θk are (μk,Σk)

Data Generation Process Assuming Mixtures of Gaussians π1 πk π3 π2

Estimating Mixture Parameters Using ML – difficult!

Estimating Mixture Parameters Using EM: Case of Unknown Means • Assumptions

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Introduce hidden or unobserved variables zi

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Main steps using EM

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Expectation Step

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Expectation Step E(zik) is just the probability that xi was generated by the k-th component:

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Maximization Step

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Summary

Estimating Mixture Parameters Using EM: General Case • Need to review Lagrange Optimization first …

Lagrange Optimization solve for x and λ g(x)=0 n+1 equations / n+1 unknowns

Lagrange Optimization (cont’d) • Example Maximize f(x1,x2)=x1x2subject to the constraint g(x1,x2)=x1+x2-1=0 3 equations / 3 unknowns

Estimating Mixture Parameters Using EM: General Case • Introduce hidden variables

Estimating Mixture Parameters Using EM: General Case (cont’d) • Expectation Step

Estimating Mixture Parameters Using EM: General Case (cont’d) • Expectation Step (cont’d)

Estimating Mixture Parameters Using EM: General Case (cont’d) • Maximization Step use Lagrange optimization

Estimating Mixture Parameters Using EM: General Case (cont’d) • Maximization Step (cont’d)

Estimating Mixture Parameters Using EM: General Case (cont’d) • Summary

Estimating the Number of Components K

Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Presentation Transcript

Chapter 1

C H A P T E R

Chapter 19 Protists

Chapter 12 Molecular Genetics

Chapter 1 We the People

Chapter 10

Unit One

Chapter 7

Profit Maximization

C H A P T E R

Splash Screen

Chapter Introduction Section 1 Rome’s Beginning Section 2 The Roman Republic

Splash Screen

Chapter 8

Chapter Menu

C H A P T E R

Chapter 9 Roman Civilization

Chapter 1

Splash Screen

Chapter Introduction Section 1 The First Israelites Section 2 The Kingdom of Israel

Chapter 5 Biodiversity and Conservation