1 / 61

Expectation Maximization Algorithm

Expectation Maximization Algorithm. Rong Jin. A Mixture Model Problem. Apparently, the dataset consists of two modes How can we automatically identify the two modes?. Gaussian Mixture Model (GMM). Assume that the dataset is generated by two mixed Gaussian distributions Gaussian model 1:

elaine-moon
Download Presentation

Expectation Maximization Algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Expectation Maximization Algorithm Rong Jin

  2. A Mixture Model Problem • Apparently, the dataset consists of two modes • How can we automatically identify the two modes?

  3. Gaussian Mixture Model (GMM) • Assume that the dataset is generated by two mixed Gaussian distributions • Gaussian model 1: • Gaussian model 2: • If we know the memberships for each bin, estimating the two Gaussian models is easy. • How to estimate the two Gaussian models without knowing the memberships of bins?

  4. EM Algorithm for GMM • Let memberships to be hidden variables • EM algorithm for Gaussian mixture model • Unknown memberships: • Unknown Gaussian models: • Learn these two sets of parameters iteratively

  5. Start with A Random Guess • Random assign the memberships to each bin

  6. Start with A Random Guess • Random assign the memberships to each bin • Estimate the means and variance of each Gaussian model

  7. E-step • Fixed the two Gaussian models • Estimate the posterior for each data point

  8. EM Algorithm for GMM • Re-estimate the memberships for each bin

  9. Weighted by posteriors Weighted by posteriors M-Step • Fixed the memberships • Re-estimate the two model Gaussian

  10. EM Algorithm for GMM • Re-estimate the memberships for each bin • Re-estimate the models

  11. At the 5-th Iteration • Red Gaussian component slowly shifts toward the left end of the x axis

  12. At the10-th Iteration • Red Gaussian component still slowly shifts toward the left end of the x axis

  13. At the 20-th Iteration • Red Gaussian component make more noticeable shift toward the left end of the x axis

  14. At the 50-th Iteration • Red Gaussian component is close to the desirable location

  15. At the 100-th Iteration • The results are almost identical to the ones for the 50-th iteration

  16. EM as A Bound Optimization • EM algorithm in fact maximizes the log-likelihood function of training data • Likelihood for a data point x • Log-likelihood of training data

  17. EM as A Bound Optimization • EM algorithm in fact maximizes the log-likelihood function of training data • Likelihood for a data point x • Log-likelihood of training data

  18. EM as A Bound Optimization • EM algorithm in fact maximizes the log-likelihood function of training data • Likelihood for a data point x • Log-likelihood of training data

  19. Logarithm Bound Algorithm • Start with initial guess

  20. Logarithm Bound Algorithm Touch Point • Start with initial guess • Come up with a lower bounded

  21. Logarithm Bound Algorithm • Start with initial guess • Come up with a lower bounded • Search the optimal solution that maximizes

  22. Logarithm Bound Algorithm • Start with initial guess • Come up with a lower bounded • Search the optimal solution that maximizes • Repeat the procedure

  23. Logarithm Bound Algorithm Optimal Point • Start with initial guess • Come up with a lower bounded • Search the optimal solution that maximizes • Repeat the procedure • Converge to the local optimal

  24. EM as A Bound Optimization • Parameter for previous iteration: • Parameter for current iteration: • Compute

  25. Concave property of logarithm function

  26. Definition of posterior

  27. Log-Likelihood of EM Alg. Saddle points

  28. Maximize GMM Model • What is the global optimal solution to GMM? • Maximizing the objective function of GMM is ill-posed problem

  29. Maximize GMM Model • What is the global optimal solution to GMM? • Maximizing the objective function of GMM is ill-posed problem

  30. Identify Hidden Variables • For certain learning problems, identifying hidden variables is not a easy task • Consider a simple translation model • For a pair of English and Chinese sentences: • A simple translation model is • The log-likelihood of training corpus

  31. Identify Hidden Variables • Consider a simple case • Alignment variable a(i) • Rewrite

  32. Identify Hidden Variables • Consider a simple case • Alignment variable a(i) • Rewrite

  33. Identify Hidden Variables • Consider a simple case • Alignment variable a(i) • Rewrite

  34. Identify Hidden Variables • Consider a simple case • Alignment variable a(i) • Rewrite

  35. EM Algorithm for A Translation Model • Introduce an alignment variable for each translation pair • EM algorithm for the translation model • E-step: compute the posterior for each alignment variable • M-step: estimate the translation probability Pr(e|c)

  36. EM Algorithm for A Translation Model • Introduce an alignment variable for each translation pair • EM algorithm for the translation model • E-step: compute the posterior for each alignment variable • M-step: estimate the translation probability Pr(e|c) We are luck here. In general, this step can be extremely difficult and usually requires approximate approaches

  37. Compute Pr(e|c) • First compute

  38. Compute Pr(e|c) • First compute

  39. Bound Optimization for A Translation Model

  40. Bound Optimization for A Translation Model

  41. Iterative Scaling • Maximum entropy model • Iterative scaling • All features • Sum of features are constant

  42. Iterative Scaling • Compute the empirical mean for each feature of every class, i.e., for every j and every class y • Start w1,w2 …, wc = 0 • Repeat • Compute p(y|x) for each training data point (xi, yi) using w from the previous iteration • Compute the mean of each feature of every class using the estimated probabilities, i.e., for every j and every y • Compute for every j and every y • Update w as

  43. Iterative Scaling

  44. Iterative Scaling Can we use the concave property of logarithm function? No, we can’t because we need a lower bound

  45. Weights still couple with each other • Still need further decomposition Iterative Scaling

  46. Iterative Scaling

  47. Iterative Scaling Wait a minute, this can not be right! What happens?

  48. Logarithm Bound Algorithm • Start with initial guess • Come up with a lower bounded • Search the optimal solution that maximizes

More Related