320 likes | 785 Views
EM Algorithm and Mixture of Gaussians. Collard Fabien - 20046056 김진식 (Kim Jinsik) - 20043152 주찬혜 (Joo Chanhye) - 20043595. Summary. Hidden Factors EM Algorithm Principles Formalization Mixture of Gaussians Generalities Processing Formalization Other Issues
E N D
EM AlgorithmandMixture of Gaussians Collard Fabien - 20046056 김진식 (Kim Jinsik) - 20043152 주찬혜 (Joo Chanhye) - 20043595
Summary • Hidden Factors • EM Algorithm • Principles • Formalization • Mixture of Gaussians • Generalities • Processing • Formalization • Other Issues • Bayesian Network with hidden variables • Hidden Markov models • Bayes net structures with hidden variables 2
Hidden factors The Problem : Hidden Factors • Unobservable / Latent / Hidden • Make them as variables • Simplicity of the model 3
162 54 54 486 54 Symptom 1 Symptom 2 Symptom 3 Hidden factors Simplicity details (graph1) 2 2 2 Smoking Diet Exercise 708 Priors ! 4
Heart Disease 54 Hidden factors Simplicity details (Graph2) 2 2 2 Smoking Diet Exercise 78 Priors 6 6 6 Symptom 1 Symptom 2 Symptom 3 5
EM Algorithm A Solution : EM Algorithm • Expectation • Maximization 6
EM Algorithm Principles : Generalities • Given : • Cause (or Factor / Component) • Evidence • Compute : • Probability in connection table 7
E Step : For each evidence (E), Use parameters to compute probability distribution Weighted Evidence : P(causes/evidence) M Step : Update the estimates of parameters Based on weighted evidence EM Algorithm Principles : The two steps Parameters : P(effects/causes) P(causes) 8
EM Algorithm Principles : the E-Step • Perception Step • For each evidence and cause • Compute probablities • Find probable relationships 9
EM Algorithm Principles : the M-Step • Learning Step • Recompute the probability • Cause event / Evidence event • Sum for all Evidence events • Maximize the loglikelihood • Modify the model parameters 10
EM Algorithm Formulae : Notations • Terms • : underlying probability distribution • x : observed data • z : unobserved data • h : current hypothesis of • h’ : revised hypothesis • q : a hidden variable distribution • Task : estimate from X • E-step: • M-step: 11
EM Algorithm Formulae : the Log Likelihood • L(h) estimates the fitting of the parameter h to the data x with the given hidden variables z : • Jensen's inequality for any distribution of hidden states q(z) : • Defines the auxiliary function A(q,h): • Lower bound on the log likelihood • What we want to optimize 12
EM Algorithm Formulae : the E-step • Lower bound on log likelihood : • H(q) entropy of q(z), • Optimize A(q,h) • By distribute data over hidden variables 13
EM Algorithm Formulae : the M-step • Maximise A(q,h) • By choosing the optimal parameters • Equivalent to optimize likelihood 14
EM Algorithm Formulae : Convergence (1/2) • EM increases the log likelihood of the data at every iteration • Kullback-Liebler (KL) divergence • Non negative • Equals 0 iff q(z)=p(z/x,h) 15
Formulae : Convergence (2/2) • Likelihood increases at each iteration • Usually, EM converges to a local optimum of L 16
Problem of likelihood • Can be high dimensional integral • Latent variables additional dimensions • Likelihood term can be complicated 17
Mixture of Gaussians The Issue : Mixture of Gaussian • Unsupervised clustering • Set of data points (Evidences) • Data generated from mixture distribution • Continuous data : Mixture of Gaussians • Not easy to handle : • Number of parameters is Dimension-squared 18
Mixture of Gaussians Gaussian Mixture model (2/2) • Distribution • Likelihood of Gaussian Distribution : • Likelihood given a GMM : • N number of Gaussians • wi the weight of Gaussian I • All weights positive • Total weight = 1 19
EM for Gaussian Mixture Model • What for ? • Find parameters: • Weights: wi=P(C=i) • Means: i • Covariances: i • How ? • Guess the priority Distribution • Guess components (Classes -or Causes) • Guess the distribution function 20
Mixture of Gaussians Processing : EM Initialization • Initialization : • Assign random value to parameters 21
Mixture of Gaussians Processing : the E-Step (1/2) • Expectation : • Pretend to know the parameter • Assign data point to a component 22
Mixture of Gaussians Processing : the E-Step (2/2) • Competition of Hypotheses • Compute the expected values of Pij of hidden indicator variables. • Each gives membership weights to data point • Normalization • Weight = relative likelihood of class membership 23
Mixture of Gaussians Processing : the M-Step (1/2) • Maximization : • Fit the parameter to its set of points 24
Mixture of Gaussians Processing : the M-Step (2/2) • For each Hypothesis • Find the new value of parameters to maximize the log likelihood • Based on • Weight of points in the class • Location of the points • Hypotheses are pulled toward data 25
Mixture of Gaussians Applied formulae : the E-Step • Find Gaussian for every data point • Use Bayes’ rule: 26
Maximize A For each parameter of h, search for : Results : μ* σ2* w* Mixture of Gaussians Applied formulae : the M-Step 27
Mixture of Gaussians Eventual problems • Gaussian Component shrinks • Variance 0 • Likelihood infinite • Gaussian Components merge • Same values • Share the data points • A Solution : reasonable prior values 28
Other Issues Bayesian Networks 29
Other Issues Hidden Markov models • Forward-Backward Algorithm • Smooth rather than filter 30
Other Issues Bayes net with hidden variables • Pretend that data is complete • Or invent new hidden variable • No label or meaning 31
Conclusion • Widely applicable • Diagnosis • Classification • Distribution Discovery • Does not work for complex models • High dimension • Structural EM 32