The EM Method

The EM Method Arthur Pece aecp@diku.dk Basic concepts EM clustering algorithm EM method and relationship to ML estimation

What is EM? • Expectation-Maximization • A fairly general optimization method • Useful when the model includes 3 kinds of variables: • visible variables x • intermediate variables h * • parameters/state variables s and we want to optimize only w.r.t. the parameters. * Here we assume that the intermediate variables are discrete

EM Method • A method to obtain ML parameter estimates -> maximize log-likelihood w.r.t. parameters. Assuming that the xi are statistically independent: likelihood for the data set = sum of likelihoods for the data points: L = Si log p(xi | s) = Si log Skp(xi | hk,s) p (hk | s) (replace 2nd sum with an integral if intermediate variables are continuous rather than discrete)

EM functional Given a pdf q(h) for the intermediate variables we define the EMfunctional: Qq = SiSkq(hk) log p(xi | hk,s) p (hk | s) This is usually much simpler than the log-likelihood: L = Si log Skp(xi | hk,s) p (hk | s) because there is no logarithm of a sum in Qq .

EM iteration Two steps: E and M • E step: q(h) is set equal to the pdf of h conditional on xi and the current estimate s(t) of s: q(t)(hk) = p(hk | xi, s(t-1)) • M step: the EM functional is maximized w.r.t. s to obtain s(t).

Example: EM clustering • m data points xi are generated by n generative processes, each process j generating a fraction wj of the data points with pdf fj (xi), parameterized by the parameter set sj (which includes wj) • We want to estimate the parameters sj for all processes

Example: EM clustering • Visible variables: m data points xi • Intermediate variables: m xn binary labels hij, Sjhij = 1 • State variables: n parameter sets sj

EM clustering for Gaussian pdf’s • The parameters are weight wj, centroid cj, covariance Aj • If we knew which data point belongs to which cluster, we could compute fraction, mean and covariance for each cluster: wj = Sihij/m cj = Sihijxi / wj Aj = Sihij (xi - cj) (xi - cj)T / wj

EM clustering (continued) • Since we do not know which cluster a data point belongs to, we assign each point to all clusters, with different probabilities qij, Sjqij = 1: wj = Siqij cj = Siqijxi / wj Aj = Siqij (xi - cj) (xi - cj)T / wj

EM clustering (continued) • The probabilities qij can be computed from the cluster parameters • Chicken & egg problem: the cluster parameters are needed to compute the probabilities, and the probabilities are needed to compute the cluster parameters

EM clustering (continued) The solution: iterate to convergence: • E step: for each data point and each cluster, compute the probability qij that the point belongs to the cluster (from the cluster parameters) • M step: re-compute the cluster parameters for all clusters by weighted averages over all points (use the equations given 2 slides ago).

How to compute the probability that a given data point originates from a given process? • Use Bayes’ theorem: qij = wjfj (xi) / Skwkfk (xi) This is how the cluster parameters are used to compute the qij

Non-decreasing log-likelihoodin the EM method Let’s return to the general EM method: we want to prove that the log-likelihood does not decrease from one iteration to the next. To do so we introduce 2 more functionals.

Entropy and Kullback-Leibler divergence Define the entropy S(q) = -SiSkq(hk) log q(hk) and the Kullback-Leibler divergence DKL[q ; p(h| x, s)] = Si Skq(hk) log [q(hk) /p(hk | xi, s)]

Non-decreasing log-likelihood I It can be proven that L = Qq + S(q) + DKL[q ; p(h| x, s)] After the E step, q(t)(h) = p(h| x, s(t-1)) and thereforeDKL is zero: L (t-1)= Qq(t-1) + S(q (t))

Non-decreasing log-likelihood II After the M step, Qq is maximized in standard EM [ Qq is increased but not maximized in GEM (generalized EM) but the result is the same ] and therefore: Qq(t) sQq(t-1) In addition we have that: L (t)sQq (t) + S(q(t)) [ This is because, for any two pdf’s q and p: DKL[q ; p] s 0 ]

Non-decreasing log-likelihood III Putting the above results together: L (t)sQq (t) + S(q(t) ) sQq (t-1) + S(q(t) ) = L (t-1) which proves that L is non-decreasing.

The EM Method

The EM Method

Presentation Transcript

The Miller Method

The Finite Element Method Computational EM: 490D

The Method

The Socratic Method

The Engineering Method

The muffin method

The Scientific Method

The Scientific Method

The Scientific Method

The Electromagnetic (EM) Method Magnetotelluric (MT)

The Pinching Method

“The Method”

The Method

The Plurality Method The Borda Count Method

The Scientific Method

The Scientific Method

Transfer Matrix Method In Solving EM Problem

The Scientific Method

The Scientific Method

The Scientific Method

The Scientific Method

The Method