EM Algorithm and Mixture of Gaussians

EM AlgorithmandMixture of Gaussians Collard Fabien - 20046056 김진식 (Kim Jinsik) - 20043152 주찬혜 (Joo Chanhye) - 20043595

Summary • Hidden Factors • EM Algorithm • Principles • Formalization • Mixture of Gaussians • Generalities • Processing • Formalization • Other Issues • Bayesian Network with hidden variables • Hidden Markov models • Bayes net structures with hidden variables 2

Hidden factors The Problem : Hidden Factors • Unobservable / Latent / Hidden • Make them as variables • Simplicity of the model 3

162 54 54 486 54 Symptom 1 Symptom 2 Symptom 3 Hidden factors Simplicity details (graph1) 2 2 2 Smoking Diet Exercise 708 Priors ! 4

Heart Disease 54 Hidden factors Simplicity details (Graph2) 2 2 2 Smoking Diet Exercise 78 Priors 6 6 6 Symptom 1 Symptom 2 Symptom 3 5

EM Algorithm A Solution : EM Algorithm • Expectation • Maximization 6

EM Algorithm Principles : Generalities • Given : • Cause (or Factor / Component) • Evidence • Compute : • Probability in connection table 7

E Step : For each evidence (E), Use parameters to compute probability distribution Weighted Evidence : P(causes/evidence) M Step : Update the estimates of parameters Based on weighted evidence EM Algorithm Principles : The two steps Parameters : P(effects/causes) P(causes) 8

EM Algorithm Principles : the E-Step • Perception Step • For each evidence and cause • Compute probablities • Find probable relationships 9

EM Algorithm Principles : the M-Step • Learning Step • Recompute the probability • Cause event / Evidence event • Sum for all Evidence events • Maximize the loglikelihood • Modify the model parameters 10

EM Algorithm Formulae : Notations • Terms •  : underlying probability distribution • x : observed data • z : unobserved data • h : current hypothesis of  • h’ : revised hypothesis • q : a hidden variable distribution • Task : estimate  from X • E-step: • M-step: 11

EM Algorithm Formulae : the Log Likelihood • L(h) estimates the fitting of the parameter h to the data x with the given hidden variables z : • Jensen's inequality for any distribution of hidden states q(z) : • Defines the auxiliary function A(q,h): • Lower bound on the log likelihood • What we want to optimize 12

EM Algorithm Formulae : the E-step • Lower bound on log likelihood : • H(q) entropy of q(z), • Optimize A(q,h) • By distribute data over hidden variables 13

EM Algorithm Formulae : the M-step • Maximise A(q,h) • By choosing the optimal parameters • Equivalent to optimize likelihood 14

EM Algorithm Formulae : Convergence (1/2) • EM increases the log likelihood of the data at every iteration • Kullback-Liebler (KL) divergence • Non negative • Equals 0 iff q(z)=p(z/x,h) 15

Formulae : Convergence (2/2) • Likelihood increases at each iteration • Usually, EM converges to a local optimum of L 16

Problem of likelihood • Can be high dimensional integral • Latent variables  additional dimensions • Likelihood term can be complicated 17

Mixture of Gaussians The Issue : Mixture of Gaussian • Unsupervised clustering • Set of data points (Evidences) • Data generated from mixture distribution • Continuous data : Mixture of Gaussians • Not easy to handle : • Number of parameters is Dimension-squared 18

Mixture of Gaussians Gaussian Mixture model (2/2) • Distribution • Likelihood of Gaussian Distribution : • Likelihood given a GMM : • N number of Gaussians • wi the weight of Gaussian I • All weights positive • Total weight = 1 19

EM for Gaussian Mixture Model • What for ? • Find parameters: • Weights: wi=P(C=i) • Means: i • Covariances: i • How ? • Guess the priority Distribution • Guess components (Classes -or Causes) • Guess the distribution function 20

Mixture of Gaussians Processing : EM Initialization • Initialization : • Assign random value to parameters 21

Mixture of Gaussians Processing : the E-Step (1/2) • Expectation : • Pretend to know the parameter • Assign data point to a component 22

Mixture of Gaussians Processing : the E-Step (2/2) • Competition of Hypotheses • Compute the expected values of Pij of hidden indicator variables. • Each gives membership weights to data point • Normalization • Weight = relative likelihood of class membership 23

Mixture of Gaussians Processing : the M-Step (1/2) • Maximization : • Fit the parameter to its set of points 24

Mixture of Gaussians Processing : the M-Step (2/2) • For each Hypothesis • Find the new value of parameters to maximize the log likelihood • Based on • Weight of points in the class • Location of the points • Hypotheses are pulled toward data 25

Mixture of Gaussians Applied formulae : the E-Step • Find Gaussian for every data point • Use Bayes’ rule: 26

Maximize A For each parameter of h, search for : Results : μ* σ2* w* Mixture of Gaussians Applied formulae : the M-Step 27

Mixture of Gaussians Eventual problems • Gaussian Component shrinks • Variance 0 • Likelihood infinite • Gaussian Components merge • Same values • Share the data points • A Solution : reasonable prior values 28

Other Issues Bayesian Networks 29

Other Issues Hidden Markov models • Forward-Backward Algorithm • Smooth rather than filter 30

Other Issues Bayes net with hidden variables • Pretend that data is complete • Or invent new hidden variable • No label or meaning 31

Conclusion • Widely applicable • Diagnosis • Classification • Distribution Discovery • Does not work for complex models • High dimension •  Structural EM 32

EM Algorithm and Mixture of Gaussians

EM Algorithm and Mixture of Gaussians

Presentation Transcript

The EM algorithm

EM algorithm

EM algorithm and applications

Three examples of the EM algorithm

The EM algorithm

The EM algorithm

Probability, Gaussians and Estimation

EM Algorithm

Factorial Mixture of Gaussians and the Marginal Independence Model

Mixture Language Models and EM Algorithm

Hidden Variables, the EM Algorithm, and Mixtures of Gaussians

Hidden Variables, the EM Algorithm, and Mixtures of Gaussians

EM algorithm and applications Lecture #9

Fitting Sums of Gaussians

Hidden Variables, the EM Algorithm, and Mixtures of Gaussians

EM Algorithm and its Applications

EM Algorithm

Difference of Gaussians

EM Algorithm and its Applications

Gaussians

Factorial Mixture of Gaussians and the Marginal Independence Model