500 likes | 820 Views
Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9. CS479/679 Pattern Recognition Dr. George Bebis. Expectation-Maximization (EM). EM is an iterative method to perform ML estimation: Starts with an initial estimate for θ .
E N D
Expectation-Maximization (EM)Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern RecognitionDr. George Bebis
Expectation-Maximization (EM) • EM is an iterative method to perform ML estimation: • Starts with an initial estimate for θ. • Refines the current estimate iteratively to increase the likelihood of the observed data: p(D/ θ)
Expectation-Maximization (EM) • EM represents a general framework – works best in situations where the data is incomplete (or can be thought as being incomplete) • Some creativity is required to recognize where the EM algorithm can be used. • Standard method for estimating the parameters of Mixtures of Gaussians (MoG).
The Case of Incomplete Data • Many times, it is impossible to apply ML estimation because we can not measure all the features or certain feature values are missing. • The EM algorithm is ideal for problems with unobserved (missing) data.
Example (Moon, 1996) x1+x2+x3=n assume a trinomial distribution: x1!x2!x3!
EM: Main Idea • If x was available, we could use ML to estimate θ, i.e., • Since x is not available: Maximize the expectation of p(Dx /θ) with respect to the unknown variables given y and an estimate of θ.
EM Steps (1) Initialization (2) Expectation (3) Maximization (4) Test for convergence
EM Steps (cont’d) (1) Initialization Step: initialize the algorithm with a guess θ0 (2) Expectation step: it is performed with respect to the unobserved variables, using the current estimate of parameters and conditioned upon the observations • When ln p(Dx / θ) is a linear function of the unobserved variables, the expectation step is equivalent to:
EM Steps (cont’d) (3)Maximization Step: provides a new estimate of the parameters. (4) Test for Convergence: stop; otherwise, go to Step 2. if
Example (Moon, 1996) (cont’d) • Suppose: x1!x2!x3!
Example (Moon, 1996) (cont’d) • Take expected value: Let’s look at the M-step before completing the E-step …
Example (Moon, 1996) (cont’d) 2 • We only need to estimate: Let’s complete the E-step now …
Example (Moon, 1996) (cont’d) (see Moon’s paper, page 53, for a proof)
Example (Moon, 1996) (cont’d) • Initialization: θ0 • Expectation Step: • Maximization Step: • Convergence Step: 2
Convergence properties of EM • The solution depends on the initial estimate θ0 • At each iteration, a value of θis computed so that the likelihood function does not decrease. • There is no guarantee that it will convergence to a global maximum. • The algorithm is guaranteed to be stable. • i.e., there is no chance of "overshooting" or diverging from the maximum.
Mixture of 1D Gaussians - Example π2=0.2 π1=0.3 π3=0.5
Mixture Model π1 πk π3 π2
Fitting a Mixture Model toa set of observations Dx • Two fundamental issues: (1) Estimate the number of mixture components K (2) Estimate mixture parameters (πk ,θk), k=1,2,…,K
Mixtures of Gaussians(see Chapter 10) where each p(x/θ)= • The parameters θk are (μk,Σk)
Data Generation Process Assuming Mixtures of Gaussians π1 πk π3 π2
Estimating Mixture Parameters Using EM: Case of Unknown Means • Assumptions
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Introduce hidden or unobserved variables zi
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Main steps using EM
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Expectation Step
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Expectation Step
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Expectation Step
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Expectation Step E(zik) is just the probability that xi was generated by the k-th component:
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Maximization Step
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Summary
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Summary
Estimating Mixture Parameters Using EM: General Case • Need to review Lagrange Optimization first …
Lagrange Optimization solve for x and λ g(x)=0 n+1 equations / n+1 unknowns
Lagrange Optimization (cont’d) • Example Maximize f(x1,x2)=x1x2subject to the constraint g(x1,x2)=x1+x2-1=0 3 equations / 3 unknowns
Estimating Mixture Parameters Using EM: General Case • Introduce hidden variables
Estimating Mixture Parameters Using EM: General Case (cont’d) • Expectation Step
Estimating Mixture Parameters Using EM: General Case (cont’d) • Expectation Step (cont’d)
Estimating Mixture Parameters Using EM: General Case (cont’d) • Expectation Step (cont’d)
Estimating Mixture Parameters Using EM: General Case (cont’d) • Maximization Step use Lagrange optimization
Estimating Mixture Parameters Using EM: General Case (cont’d) • Maximization Step (cont’d)
Estimating Mixture Parameters Using EM: General Case (cont’d) • Summary
Estimating Mixture Parameters Using EM: General Case (cont’d) • Summary