450 likes | 483 Views
Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9. CS479/679 Pattern Recognition Dr. George Bebis. Expectation-Maximization (EM). EM is an iterative ML estimation method: Starts with an initial estimate for θ .
E N D
Expectation-Maximization (EM)Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern RecognitionDr. George Bebis
Expectation-Maximization (EM) • EM is an iterative ML estimation method: • Starts with an initial estimate for θ. • Refines the current estimate iteratively to increase the likelihood of the observed data: p(D/ θ)
Expectation-Maximization (cont’d) • EM represents a general framework – works best in situations where the data is incomplete (or can be thought as being incomplete) • Some creativity is required to recognize where the EM algorithm can be used. • Standard method for estimating the parameters of Mixtures of Gaussians (MoG).
Incomplete Data • Many times, it is impossible to apply ML estimation because certain features cannot be measured directly. • The EM algorithm is ideal for problems with unobserved (missing) data. Example
Example (from Moon, 1996) s Todd Moon, The Expectation-Maximization Algorithm, IEEE Signal Processing Magazine, November 1996
EM: Main Idea • If x were available, we would estimate θusing ML: • Since only y is available, estimate θby: Maximize the expectation of ln p(Dx /θ) (with respect to the unknown variables)given Dy and current estimate θt.
EM Steps (1) Initialization (2) E-Step: Expectation (3) M-Step: Maximization (4) Repeat (2)-(3) until convergence
EM Steps (cont’d) (1) Initialization Step: initialize parameters θ0 (2) Expectation step: performed with respect to the unobserved variables, using the current estimate of parameters θt and conditioned upon the observations: • Note that If ln p(Dx / θ) is a linear function of the unobserved variables, the expectation step is equivalent to:
EM Steps (cont’d) (3)Maximization Step: provides a new estimate θt+1 of the parameters: (4) Test for Convergence: stop; otherwise, go to Step 2. if
Example (from Moon, 1996) (cont’d) s Assume a trinomial distribution: k! x1!x2!x3! x1+x2+x3=k
Example (from Moon, 1996) (cont’d) k! wherexi=(xi1,xi2,xi3)
Example (Moon, 1996) (cont’d) • Take the expected value: k!
Example (Moon, 1996) (cont’d) 2Σi = Σi • We only need to estimate:
Example (Moon, 1996) (cont’d) (see Moon’s paper, page 53)
Example (Moon, 1996) (cont’d) • Initialization: θ0 • Expectation Step: • Maximization Step: • Convergence Step: 2Σi Σi
Convergence properties of EM • The solution depends on the initial estimate θ0 • At each iteration, a value of θis computed so that the likelihood function does not decrease. • The algorithm is guaranteed to be stable(i.e., does not oscillate). • There is no guarantee that it will convergence to a global maximum.
Mixture Models • EM is the standard method for estimating the parameters of “mixture models”. Example: mixture of 2D Gaussians
Mixture Model (cont’d) π1 πk π3 π2
Mixture of 1D Gaussians - Example π2=0.2 π1=0.3 π3=0.5
Estimating the parameters of a Mixture Model • Two fundamental problems: (1) Estimate the number of mixture components K (2) Estimate mixture parameters (πk,θk), k=1,2,…,K
Mixtures of Gaussians(Chapter 10) where p(x/θk)= • In this case, θk = (μk,Σk) k k k k
Data Generation Process Using Mixtures of Gaussians π1 πk π3 π2
Estimating Mixture Parameters Using ML – not easy! • Maximize which is equivalent to: • In general, it is not possible to solve explicitly and iterative schemes must be employed. where max
Estimating Mixture Parameters Using EM: Case of Unknown Means • Assumptions Observation … but we don’t!
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Introduce hidden or unobserved variables zi
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Main steps using EM
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Expectation Step substitute
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) (ignoring πκ since they are all equal) substitute
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d)
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Expectation Step E(zik) is just the probability that xi was generated by the k-th component:
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Maximization Step where
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d)
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d)
Estimating Mixture Parameters Using EM: General Case • Estimate θk=(μk, Σk), πk forevery k • Introduce hidden variables ziagain
Estimating Mixture Parameters Using EM: General Case (cont’d)
Estimating Mixture Parameters Using EM: General Case (cont’d) • Expectation Step
Estimating Mixture Parameters Using EM: General Case (cont’d) • Maximization Step (cont’d) Use Lagrange Optimization!
Estimating Mixture Parameters Using EM: General Case (cont’d) • Maximization Step (cont’d) n
Estimating Mixture Parameters Using EM: General Case (cont’d) • Maximization Step (cont’d)
Estimating Mixture Parameters Using EM: General Case (cont’d)
Estimating Mixture Parameters Using EM: General Case (cont’d)
Estimating the Number of Components K • Other methods are possible such as using mutual • information theory Zheng Rong Yang and Mark Zwolinski, “Mutual information theory for adaptive mixture models”, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 23, NO. 4, APRIL 2001