70 likes | 186 Views
Opinionated. Lessons. in Statistics. by Bill Press. #30 Expectation Maximization (EM) Methods. X. f. ¸. I. 1. =. i. i. X. X. h. l. ¸. ¸. l. T. Q. Q. ¸. e. n. n. n. i. i. i. i. i. i. Let’s look at the theory behind EM methods:. Preliminary: Jensen’s inequality
E N D
Opinionated Lessons in Statistics by Bill Press #30 Expectation Maximization (EM) Methods Professor William H. Press, Department of Computer Science, the University of Texas at Austin
X f ¸ I 1 = i i X X h l ¸ ¸ l T Q Q ¸ e n n n i i i i i i Let’s look at the theory behind EM methods: Preliminary: Jensen’s inequality If a function is concave (downward), then function(interpolation) interpolation(function) Log is concave (downward). Jensen’s inequality is thus: This gets used a lot when playing with log-likelihoods. Proof of the EM method that we now give is just one example. Professor William H. Press, Department of Computer Science, the University of Texas at Austin
h d t t x a r e e a a d b l i i i i t z a r e m s s n g a a o r n u s a n c e v a r a e s ( ) ( j ) µ l µ L P ´ x n µ b d d i t t t a r e p a r a m e e r s o e e e r m n e " # X ( j ) ( j ) l µ µ P P x z z n = z " # ( j ) ( j ) µ µ P P x z z X 0 0 0 ( j ) ( j ) ( ) l µ l µ µ P P L ¡ + z x x n n = 0 ( j ) µ P z x z · ¸ ( j ) ( j ) µ µ P P x z z X 0 0 ( j ) ( ) µ l µ P L ¸ + z x n 0 0 ( j ) ( j ) µ µ P P z x x z · ¸ ( j ) µ P x z X 0 0 ( j ) ( ) µ l µ P L + z x n = 0 ( j ) µ P z x z 0 0 ( ) ( ) µ µ f µ b d µ B L ´ o r a n y a o u n o n ; , The basic EM theorem (Dempster, Laird, and Rubin): Find qthat maximizes the log-likelihood of the data: marginalize over z Professor William H. Press, Department of Computer Science, the University of Texas at Austin
0 0 0 ( ) ( ) ( ) f h µ µ µ µ h µ µ µ d h h S N B L L i i i i t t t t t t t t o o c e w e m a a a x m z e w e a v o e v e r w e a r e g u a r a n e e a e n e w = = , ; , , 0 0 ( ) h µ b l l d h h µ h l l k l h d l b L T i i i i i i i t t t t t t t m s o a x e o w u n n c o r u e c a s e e s e a c u a s c a n e e o r m o n : a e o n y y c o n v e r g n g o . ( ) ( ) l l l f µ L t t a e a s a o c a m a x o (Can you see why?) And it works whether the maximization step is local or global. Professor William H. Press, Department of Computer Science, the University of Texas at Austin
· ¸ ( j ) µ P x z X 0 0 0 ( j ) µ µ l P z x a r g m a x n = 0 ( j ) µ P µ z x z X 0 ( j ) [ ( j ) ] µ l µ P P z x x z a r g m a x n = µ z X 0 ( j ) [ ( j ) ( j ) ] µ l µ µ P P P z x x z z a r g m a x n = µ z So the general EM algorithm repeats the maximization: sometimes (missing data) no dependence on z sometimes (nuisance parameters) a uniform prior each form is sometimes useful computing this is the E-step This is an expectation that can often be computed in some better way than literally integrating over all possible values of z. maximizing this is the M-step This is a general way of handling missing data or nuisance parameters if you can estimate the probability of what is missing, given what you see (and a parameters guess). Professor William H. Press, Department of Computer Science, the University of Texas at Austin
( ) h f d i i i i i t t t t t t z m s s n g s e a s s g n m e n o a a p o n s o c o m p o n e n s µ f h d § i t t ¹ c o n s s s o e s a n s 0 ( j ) µ P z x p ! k n X X 1 ¡ £ ¤ 0 ( j ) [ ( j ) ] ( ) ( ) µ l µ l d § § P P t ¡ ¡ ¡ ¡ z x x x ¹ ¢ ¢ x ¹ n p n e ! k k k k k n n n k z n ; Might not be instantly obvious how GMM fits this paradigm! Showing that this is maximized by the previous re-estimation formulas for m and S is a multidimensional (fancy) form of the theorem that the mean is the measure of central tendency that minimizes the mean square deviation. See Wikipedia: Expectation-Maximization Algorithm for detailed derivation. The next time we see an EM method will be when we discuss Hidden Markov Models. The “Baum-Welch re-estimation algorithm” for HMMs is an EM method. Professor William H. Press, Department of Computer Science, the University of Texas at Austin