1 / 44

Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9. CS479/679 Pattern Recognition Dr. George Bebis. Expectation-Maximization (EM). EM is an iterative ML estimation method: Starts with an initial estimate for θ .

jhouston
Download Presentation

Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Expectation-Maximization (EM)Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern RecognitionDr. George Bebis

  2. Expectation-Maximization (EM) • EM is an iterative ML estimation method: • Starts with an initial estimate for θ. • Refines the current estimate iteratively to increase the likelihood of the observed data: p(D/ θ)

  3. Expectation-Maximization (cont’d) • EM represents a general framework – works best in situations where the data is incomplete (or can be thought as being incomplete) • Some creativity is required to recognize where the EM algorithm can be used. • Standard method for estimating the parameters of Mixtures of Gaussians (MoG).

  4. Incomplete Data • Many times, it is impossible to apply ML estimation because certain features cannot be measured directly. • The EM algorithm is ideal for problems with unobserved (missing) data. Example

  5. Example (from Moon, 1996) s Todd Moon, The Expectation-Maximization Algorithm, IEEE Signal Processing Magazine, November 1996

  6. Example (from Moon, 1996) (cont’d) y2 y1

  7. EM: Main Idea • If x were available, we would estimate θusing ML: • Since only y is available, estimate θby: Maximize the expectation of ln p(Dx /θ) (with respect to the unknown variables)given Dy and current estimate θt.

  8. EM Steps (1) Initialization (2) E-Step: Expectation (3) M-Step: Maximization (4) Repeat (2)-(3) until convergence

  9. EM Steps (cont’d) (1) Initialization Step: initialize parameters θ0 (2) Expectation step: performed with respect to the unobserved variables, using the current estimate of parameters θt and conditioned upon the observations: • Note that If ln p(Dx / θ) is a linear function of the unobserved variables, the expectation step is equivalent to:

  10. EM Steps (cont’d) (3)Maximization Step: provides a new estimate θt+1 of the parameters: (4) Test for Convergence: stop; otherwise, go to Step 2. if

  11. Example (from Moon, 1996) (cont’d) s Assume a trinomial distribution: k! x1!x2!x3! x1+x2+x3=k

  12. Example (from Moon, 1996) (cont’d) k! wherexi=(xi1,xi2,xi3)

  13. Example (Moon, 1996) (cont’d) • Take the expected value: k!

  14. Example (Moon, 1996) (cont’d) 2Σi = Σi • We only need to estimate:

  15. Example (Moon, 1996) (cont’d) (see Moon’s paper, page 53)

  16. Example (Moon, 1996) (cont’d) • Initialization: θ0 • Expectation Step: • Maximization Step: • Convergence Step: 2Σi Σi

  17. Example (Moon, 1996) (cont’d) θt

  18. Convergence properties of EM • The solution depends on the initial estimate θ0 • At each iteration, a value of θis computed so that the likelihood function does not decrease. • The algorithm is guaranteed to be stable(i.e., does not oscillate). • There is no guarantee that it will convergence to a global maximum.

  19. Mixture Models • EM is the standard method for estimating the parameters of “mixture models”. Example: mixture of 2D Gaussians

  20. Mixture Model (cont’d) π1 πk π3 π2

  21. Mixture of 1D Gaussians - Example π2=0.2 π1=0.3 π3=0.5

  22. Estimating the parameters of a Mixture Model • Two fundamental problems: (1) Estimate the number of mixture components K (2) Estimate mixture parameters (πk,θk), k=1,2,…,K

  23. Mixtures of Gaussians(Chapter 10) where p(x/θk)= • In this case, θk = (μk,Σk) k k k k

  24. Data Generation Process Using Mixtures of Gaussians π1 πk π3 π2

  25. Estimating Mixture Parameters Using ML – not easy! • Maximize which is equivalent to: • In general, it is not possible to solve explicitly and iterative schemes must be employed. where max

  26. Estimating Mixture Parameters Using EM: Case of Unknown Means • Assumptions Observation … but we don’t!

  27. Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Introduce hidden or unobserved variables zi

  28. Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Main steps using EM

  29. Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Expectation Step substitute

  30. Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) (ignoring πκ since they are all equal) substitute

  31. Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d)

  32. Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Expectation Step E(zik) is just the probability that xi was generated by the k-th component:

  33. Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d) • Maximization Step where

  34. Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d)

  35. Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d)

  36. Estimating Mixture Parameters Using EM: General Case • Estimate θk=(μk, Σk), πk forevery k • Introduce hidden variables ziagain

  37. Estimating Mixture Parameters Using EM: General Case (cont’d)

  38. Estimating Mixture Parameters Using EM: General Case (cont’d) • Expectation Step

  39. Estimating Mixture Parameters Using EM: General Case (cont’d) • Maximization Step (cont’d) Use Lagrange Optimization!

  40. Estimating Mixture Parameters Using EM: General Case (cont’d) • Maximization Step (cont’d) n

  41. Estimating Mixture Parameters Using EM: General Case (cont’d) • Maximization Step (cont’d)

  42. Estimating Mixture Parameters Using EM: General Case (cont’d)

  43. Estimating Mixture Parameters Using EM: General Case (cont’d)

  44. Estimating the Number of Components K • Other methods are possible such as using mutual • information theory Zheng Rong Yang and Mark Zwolinski, “Mutual information theory for adaptive mixture models”, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 23, NO. 4, APRIL 2001

More Related