340 likes | 491 Views
Incomplete Graphical Models. Nan Hu. Outline . Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture Regression and classification EM on conditional mixture A general formulation of EM Algorithm. K-means clustering.
E N D
Incomplete Graphical Models Nan Hu
Outline • Motivation • K-means clustering • Coordinate Descending algorithm • Density estimation • EM on unconditional mixture • Regression and classification • EM on conditional mixture • A general formulation of EM Algorithm
K-means clustering Problem: Given a set of observations how to group them into a set of K clustering, supposing the value of K is given. • First Phase • Second Phase
K-means clustering First Iteration Original Set Second Iteration Third Iteration
K-means clustering • Coordinate descent algorithm • The algorithm is trying to minimize distortion measure J by setting the partial derivatives to zero
Unconditional Mixture Problem: If the given sample data demonstrate multimodal densities, how to estimate the true density? Fit a single density with this bimodal case. Although algorithm converges, the results bear little relationship to the truth.
Unconditional Mixture • A “divide-and-conquer” way to solve this problem • Introducing latent variable Z Multinomial node taking on one of K values Z Assign a density model for each subpopulation, overall density is X Back
Unconditional Mixture • Gaussian Mixture Models • In this model, the mixture components are Gaussian distributions with parameters • Probability model for a Gaussian mixture
Unconditional Mixture • Posterior probability of latent variable Z: • Log likelihood:
Unconditional Mixture • Partial derivative of over using Lagrange Multipliers • Solve it, we have
Unconditional Mixture • Partial derivative of over • Setting it to zero, we have
Unconditional Mixture • Partial derivative of over • Setting it to zero, we have
Unconditional Mixture • The EM Algorithm • First Phase • Second Phase Back
Unconditional Mixture • EM algorithm from expected complete log likelihood point of view Suppose we observed the latent variables , the data set becomes completely observed, the likelihood is defined as the complete log likelihood
Unconditional Mixture We treat the as random variables and take expectations conditioned on X and . Note are binary r.v., where Use this as the “best guess” for , we have Expected complete log likelihood
Unconditional Mixture • Minimizing expected complete log likelihood by setting the derivatives to zero, we have
Conditional Mixture • Graphical Model For regression and classification X The relationship between X and Z can be modeled in a discriminative classification way, e.g. softmax func. Z Y Latent variable Z, multinomial node taking on one of K values Back
Conditional Mixture • By marginalizing over Z, • X is taken to be always observed. The posterior probability is defined as
Conditional Mixture • Some specific choice of mixture components • Gaussian components • Logistic components Where is the logistic function:
Conditional Mixture • Parameter estimation via EM Complete log likelihood : Use expectation as the “best guess”, we have
Conditional Mixture • The expected complete log likelihood can then be written as • Taking partial derivatives and setting them to zero to find the update formula for EM
Conditional Mixture Summary of EM algorithm for conditional mixture • (E step): Calculate the posterior probabilities • (M step): Use the IRLS algorithm to update the parameter , base on data pairs . • (M step): Use the weighted IRLS algorithm to update the parameters , based on the data points , with weights . Back
General Formulation • - all observable variables • - all latent variables • - all parameters Suppose is observed, the ML estimate is However, is in fact not observed Complete log likelihood Incomplete log likelihood
General Formulation • Suppose factors in some way, complete log likelihood turns to be • Since is unknown, it’s not clear how to solve this ML estimation. However, if we average over the r.v. of
General Formulation • Use as an estimate of , complete log likelihood becomes expected complete log likelihood • This expected complete log likelihood becomes solvable, and hopefully, it’ll also improve the complete log likelihood in some way. (The basic idea behind EM.)
General Formulation • EM maximizes incomplete log likelihood Jensen’s Inequality Auxiliary Function
General Formulation • Given , maximizing is equal to maximizing the expected complete log likelihood
General Formulation • Given , the choice yields the maximum of . Note:is the upper bound of
General Formulation • From above, at every step of EM, we maximized . • However, how do we know whether the finally maximized also maximized incomplete log likelihood ?
General Formulation • The different between and non-negative and uniquely minimized at KL divergence
General Formulation • EM and alternating minimization • Recall the maximization of the likelihood is exactly the same as minimization of KL divergence between the empirical distribution and the model. • Including the latent variable , KL divergence comes to be a “complete KL divergence” between joint distributions on .
General Formulation • Reformulated EM algorithm • (E step) • (M step) Alternating minimization algorithm
Summary • Unconditional Mixture • Graphic model • EM algorithm • Conditional Mixture • Graphic model • EM algorithm • A general formulation of EM algorithm • Maximizing auxiliary function • Minimizing “complete KL divergence”
Incomplete Graphical Models Thank You!