1 / 54

The EM algorithm

The EM algorithm. LING 572 Fei Xia 03/01/07. What is EM?. EM stands for “expectation maximization”. A parameter estimation method: it falls into the general framework of maximum-likelihood estimation (MLE).

lindley
Download Presentation

The EM algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The EM algorithm LING 572 Fei Xia 03/01/07

  2. What is EM? • EM stands for “expectation maximization”. • A parameter estimation method: it falls into the general framework of maximum-likelihood estimation (MLE). • The general form was given in (Dempster, Laird, and Rubin, 1977), although essence of the algorithm appeared previously in various forms.

  3. Outline • MLE • EM • Basic concepts • Main ideas of EM • EM for PM models • An example: Forward-backward algorithm

  4. MLE

  5. What is MLE? • Given • A sample X={X1, …, Xn} • A vector of parameters θ • We define • Likelihood of the data: P(X | θ) • Log-likelihood of the data: L(θ)=log P(X|θ) • Given X, find

  6. MLE (cont) • Often we assume that Xis are independently identically distributed (i.i.d.) • Depending on the form of P(X | θ), solving optimization problem can be easy or hard.

  7. An easy case • Assuming • A coin has a probability p of being heads, 1-p of being tails. • Observation: We toss a coin N times, and the result is a set of Hs and Ts, and there are m Hs. • What is the value of p based on MLE, given the observation?

  8. An easy case (cont) p= m/N

  9. EM: basic concepts

  10. Basic setting in EM • X is a set of data points: observed data • Θ is a parameter vector. • EM is a method to find θML where • Calculating P(X | θ) directly is hard. • Calculating P(X,Y|θ) is much simpler, where Y is “hidden” data (or “missing” data).

  11. The basic EM strategy • Z = (X, Y) • Z: complete data (“augmented data”) • X: observed data (“incomplete” data) • Y: hidden data (“missing” data)

  12. The “missing” data Y • Y need not necessarily be missing in the practical sense of the word. • It may just be a conceptually convenient technical device to simplify the calculation of P(X | θ). • There could be many possible Ys.

  13. Examples of EM

  14. The EM algorithm • Consider a set of starting parameters • Use these to “estimate” the missing data • Use “complete” data to update parameters • Repeat until convergence

  15. Highlights • General algorithm for missing data problems • Requires “specialization” to the problem at hand • Examples of EM: • Forward-backward algorithm for HMM • Inside-outside algorithm for PCFG • EM in IBM MT Models

  16. EM: main ideas

  17. Idea #1: find θ that maximizes the likelihood of training data

  18. Idea #2: find the θt sequence No analytical solution  iterative approach, find s.t.

  19. Idea #3: find θt+1 that maximizes a tight lower bound of a tight lower bound

  20. Idea #4: find θt+1 that maximizes the Q function Lower bound of The Q function

  21. The EM algorithm • Start with initial estimate, θ0 • Repeat until convergence • E-step: calculate • M-step: find

  22. Important classes of EM problem • Products of multinomial (PM) models • Exponential families • Gaussian mixture • …

  23. The EM algorithm for PM models

  24. PM models Where is a partition of all the parameters, and for any j

  25. HMM is a PM

  26. PCFG • PCFG: each sample point (x,y): • x is a sentence • y is a possible parse tree for that sentence.

  27. PCFG is a PM

  28. Q-function for PM

  29. Maximizing the Q function Maximize Subject to the constraint Use Lagrange multipliers

  30. Optimal solution Expected count Normalization factor

  31. PCFG example • Calculate expected counts • Update parameters

  32. EM: An example

  33. Forward-backward algorithm • HMM is a PM (Product of Multi-nominal) Model • Forward-backward algorithm is a special case of the EM algorithm for PM Models. • X (observed data): each data point is an O1T. • Y (hidden data): state sequence X1T. • Θ (parameters): aij, bijk, πi.

  34. Expected counts

  35. Expected counts (cont)

  36. The inner loop for forward-backward algorithm Given an input sequence and • Calculate forward probability: • Base case • Recursive case: • Calculate backward probability: • Base case: • Recursive case: • Calculate expected counts: • Update the parameters:

  37. HMM • A HMM is a tuple : • A set of states S={s1, s2, …, sN}. • A set of output symbols Σ={w1, …, wM}. • Initial state probabilities • State transition prob: A={aij}. • Symbol emission prob: B={bijk} • State sequence: X1…XT+1 • Output sequence: o1…oT

  38. Forward probability The probability of producing oi,t-1 while ending up in state si:

  39. Calculating forward probability Initialization: Induction:

  40. Backward probability • The probability of producing the sequence Ot,T, given that at time t, we are at state si.

  41. Calculating backward probability Initialization: Induction:

  42. Calculating the prob of the observation

  43. Estimating parameters • The prob of traversing a certain arc at time t given O: (denoted by pt(i, j) in M&S)

  44. The prob of being at state i at time t given O:

  45. Expected counts Sum over the time index: • Expected # of transitions from state i to j in O: • Expected # of transitions from state i in O:

  46. Update parameters

  47. Final formulae

  48. Summary

  49. Summary • The EM algorithm • An iterative approach • L(θ) is non-decreasing at each iteration • Optimal solution in M-step exists for many classes of problems. • The EM algorithm for PM models • Simpler formulae • Three special cases • Inside-outside algorithm • Forward-backward algorithm • IBM Models for MT

  50. Relations among the algorithms The generalized EM The EM algorithm PM Inside-Outside Forward-backward IBM models Gaussian Mix

More Related