Machine Learning Expectation Maximization

Machine LearningExpectation Maximization Tom M. Mitchell

Expectation Maximization (EM) • When to use: • Data is only partially observable • Unsupervised clustering (target value unobservable) • Supervised learning (some instance attributes unobservable) • Some uses: • Train Bayesian Belief Networks • Unsupervised clustering (AUTOCLASS) • Learning Hidden Markov Models

Generating Data from Mixture of k Gaussians • Each instance x generated by 1. Choosing one of the k Gaussians with uniform probability 2. Generating an instance at random according to that Gaussian

EM for Estimating k Means (1/2) • Given: • Instances from X generated by mixture of k Gaussian distributions • Unknown means <1,…,k > of the k Gaussians • Don’t know which instance xiwas generated by which Gaussian • Determine: • Maximum likelihood estimates of <1,…,k > • Think of full description of each instance as yi = < xi, zi1, zi2> where • zij is 1 if xi generated by jth Gaussian • xi observable • zij unobservable

EM for Estimating k Means (2/2) • EM Algorithm: Pick random initial h = <1, 2> then iterate E step: Calculate the expected value E[zij] of each hidden variable zij, assuming the current hypothesis h = <1, 2> holds. M step: Calculate a new maximum likelihood hypothesis h' = <'1, '2>, assuming the value taken on by each hidden variable zij is its expected value E[zij] calculated above. Replace h = <1, 2> by h' = <'1, '2>.

EM Algorithm • Converges to local maximum likelihood h and provides estimates of hidden variables zij • In fact, local maximum in E[ln P(Y|h)] • Y is complete (observable plus unobservable variables) data • Expected value is taken over possible values of unobserved variables in Y

General EM Problem • Given: • Observed data X = {x1,…, xm} • Unobserved data Z = {z1,…, zm} • Parameterized probability distribution P(Y|h), where • Y = {y1,…, ym} is the full data yi = xizi • h are the parameters • Determine: h that (locally) maximizes E[ln P(Y|h)] • Many uses: • Train Bayesian belief networks • Unsupervised clustering (e.g., k means) • Hidden Markov Models

General EM Method • Define likelihood function Q(h'|h) which calculates Y = XZ using observed X and current parameters h to estimate Z Q(h'|h) E[ln P(Y| h')|h, X] • EM Algorithm: • Estimation (E) step: Calculate Q(h'|h) using the current hypothesis h and the observed data X to estimate the probability distribution over Y . Q(h'|h) E[ln P(Y| h')|h, X] • Maximization (M) step: Replace hypothesis h by the hypothesis h' that maximizes this Q function.

The Deduction of General EM Method

Which represents

Algorithm for EM to EstimatekMeans • EM Algorithm: Pick random initialh = < 1, 2, ,…, k > then iterate until the goodness is satisfied E step: Calculate the expected value E[zij] of each hidden variable zij, assuming the current hypothesis h = <1, 2, ,…, k > holds. M step: Calculate a new maximum likelihood hypothesis h' = <'1, '2>, assuming the value taken on by each hidden variable zij is its expected value E[zij] calculated above. Replace h = < 1, 2, ,…, k > by h' = < '1, '2, ,…, 'k >.

Ex: 現在有X1~Xn的資料，不知他是屬於M1~Ma的哪一個state或mixture component，因此這個演算法是期望推測能越來越準確。 • 所以目標就是希望找出max ln P(X|)，其中在已知θ={(u1,σ1),…, (uk,σk)}假設下，X的Likelihood

在已知θ 的情況下，x資料落在各母體的機率總和在已知θ的情況下，特定母體k的之比例分佈在已知θ以及在特定母體k的情況下，X資料的機率分佈在已知θ 的情況下，x資料落在母體k的機率分佈

在已知θ 的情況下，x資料落在各母體的機率總和在已知θ的情況下，特定母體k的之比例分佈在已知θ以及在特定母體k的情況下，x資料的機率分佈在已知θ 的情況下，x資料落在母體k的機率分佈取對數

在已知舊有θ與特定樣本資料xn的情況下，屬於母體k的機率在已知舊有θ與特定樣本資料xn的情況下，屬於母體k的機率在已知θ 的情況下， xn資料落在母體k的機率

清大許鈞南教授資料探勘所出的EM問題: The Problem: 假設我們抓到了一群入侵地球的外星人。我們知道他們是從幾個不同的星球來的，但是從他們的外表看不出來他們是從哪一個星球來的。我們同時也知道不同的星球來的外星人，智能高低有所不同，所以給他們作了IQ智力測驗，並收集了資料。你的任務是用EM演算法，根據智力測驗的結果，把這些外星人區分出來… Well, before you can complete this mission, you might need to learn more about the EM algorithm. The purpose of this homework is to let you observe the performance of the EM algorithm.

Your task: • 寫一個可以產生常態分佈的樣本產生程式。這個程式的輸出為兩組常態分佈的樣本，輸入參數有：單組輸出值的個數，兩個平均值（means），兩個標準差。常態分佈產生的演算法如下： • 先產生兩組互相獨立的，均勻分佈，在0到1之間的亂數。 • 令U1,U2為前一步驟產生的亂數，令X1 = sqrt(-2*ln U1)*cos (2*pi*U2), X2 = sqrt(-2*ln U1)* sin (2*pi*U2)。 • X1,X2 構成的兩組樣本即為兩組mean = 0, std = 1的常態分佈樣本。 • 根據定理，如果X為mean =0, std = 1的常態分佈樣本，則如果Y = m + s*X, 則Y為mean = m, std = s 的常態分佈樣本。依此定理把X1,X2轉換成所需要的常態分佈樣本。 • 依照上課時導出的公式，寫一個作EM演算法的程式。這個程式的輸入為一組樣本，及假設的群數。輸出為這組樣本，及每一個樣本屬於各個群的機率。（That is, arrays x[i] and z[i,j] as in the textbook）另外，這個程式在每次的E、M兩步驟循環，也要輸出執行後假設的各群平均值，以及每次是第幾次的循環。在這個程式中，每一群的常態分配的標準差都假設為1.0。當新的假設平均值與上一次的假設平均值的差的絕對值中的最大值小於0.001時，就算收斂，程式結束。或是當EM循環次數大於100時，還不收斂，就算失敗，程式結束。 • 這兩個程式寫好了以後，就可以開始作實驗了。第一個實驗是先產生兩組常態分佈樣本， std = 1.0, mean 各為4,50,每組各有120個樣本，共240個數。看你的EM程式在已知有兩群的情況下是否能正確收斂？ • 第二個實驗：產生3組常態分佈樣本，std = 1.0, mean 各為4, 50, 80, 每組各有80個樣本，看你的EM程式在假設只有兩群的情況下作什麼輸出？ • 第三個實驗：產生2組常態分佈樣本，std = 1.0, mean 各為4, 50, 每組各有120個樣本，看你的EM程式在假設有三群的情況下作什麼輸出？ • 第四個實驗：產生2組常態分佈樣本，std = 1.0, mean 各為9, 50, 每組各有120個樣本，用你的EM程式去作分群。重複以上的動作，但是第一組資料的 mean 從9起遞增，每次加5,加到39為止。看你的EM程式在假設只有兩群的情況下作什麼輸出？ • Write up your results into a report. The report must include (1) for every experiment, report the number of EM cycles and the resulting hypothesized means; (2) summarize the results in one page; (3) some insight you obtain while working on this homework

EM Algorithm • 運用EM Algorithm能夠在一組來自多個母體而尚未分群的資料中,找到各群資料的平均數(μ )與標準差(σ ),並且分辨各筆資料來自何處

範例: • 假設來自A,B,C,D星球的外星人，其外觀一致，無法從外觀來分辨來自何處。不過我們已知來自不同星球的外星人之血紅素指數含量不同。我們希望能夠藉由對外星人的抽血檢驗來分辨外星人來自何處，並且了解各個星球的外星人的血紅素指數的平均數與標準差

資料: • 使用亂數產生器產生1000筆樣本資料: N(25,2) , N(40,3) , N(55,2) , N(80,2) 抽樣比例:20%,30%,30%,20%

使用Matlab預測結果 原資料分配: N(25.0,2.0) , N(40.0,3.0) , N(55.0,2.0) , N(80.0,2.0) 學習後結果: N(24.9495,1.9683), N(39.8179,2.8696), N(54.9388,1.7878), N(79,8573,2.1)

Machine Learning Expectation Maximization