Incomplete Graphical Models

Incomplete Graphical Models Nan Hu

Outline • Motivation • K-means clustering • Coordinate Descending algorithm • Density estimation • EM on unconditional mixture • Regression and classification • EM on conditional mixture • A general formulation of EM Algorithm

K-means clustering Problem: Given a set of observations how to group them into a set of K clustering, supposing the value of K is given. • First Phase • Second Phase

K-means clustering First Iteration Original Set Second Iteration Third Iteration

K-means clustering • Coordinate descent algorithm • The algorithm is trying to minimize distortion measure J by setting the partial derivatives to zero

Unconditional Mixture Problem: If the given sample data demonstrate multimodal densities, how to estimate the true density? Fit a single density with this bimodal case. Although algorithm converges, the results bear little relationship to the truth.

Unconditional Mixture • A “divide-and-conquer” way to solve this problem • Introducing latent variable Z Multinomial node taking on one of K values Z Assign a density model for each subpopulation, overall density is X Back

Unconditional Mixture • Gaussian Mixture Models • In this model, the mixture components are Gaussian distributions with parameters • Probability model for a Gaussian mixture

Unconditional Mixture • Posterior probability of latent variable Z: • Log likelihood:

Unconditional Mixture • Partial derivative of over using Lagrange Multipliers • Solve it, we have

Unconditional Mixture • Partial derivative of over • Setting it to zero, we have

Unconditional Mixture • The EM Algorithm • First Phase • Second Phase Back

Unconditional Mixture • EM algorithm from expected complete log likelihood point of view Suppose we observed the latent variables , the data set becomes completely observed, the likelihood is defined as the complete log likelihood

Unconditional Mixture We treat the as random variables and take expectations conditioned on X and . Note are binary r.v., where Use this as the “best guess” for , we have Expected complete log likelihood

Unconditional Mixture • Minimizing expected complete log likelihood by setting the derivatives to zero, we have

Conditional Mixture • Graphical Model For regression and classification X The relationship between X and Z can be modeled in a discriminative classification way, e.g. softmax func. Z Y Latent variable Z, multinomial node taking on one of K values Back

Conditional Mixture • By marginalizing over Z, • X is taken to be always observed. The posterior probability is defined as

Conditional Mixture • Some specific choice of mixture components • Gaussian components • Logistic components Where is the logistic function:

Conditional Mixture • Parameter estimation via EM Complete log likelihood : Use expectation as the “best guess”, we have

Conditional Mixture • The expected complete log likelihood can then be written as • Taking partial derivatives and setting them to zero to find the update formula for EM

Conditional Mixture Summary of EM algorithm for conditional mixture • (E step): Calculate the posterior probabilities • (M step): Use the IRLS algorithm to update the parameter , base on data pairs . • (M step): Use the weighted IRLS algorithm to update the parameters , based on the data points , with weights . Back

General Formulation • - all observable variables • - all latent variables • - all parameters Suppose is observed, the ML estimate is However, is in fact not observed Complete log likelihood Incomplete log likelihood

General Formulation • Suppose factors in some way, complete log likelihood turns to be • Since is unknown, it’s not clear how to solve this ML estimation. However, if we average over the r.v. of

General Formulation • Use as an estimate of , complete log likelihood becomes expected complete log likelihood • This expected complete log likelihood becomes solvable, and hopefully, it’ll also improve the complete log likelihood in some way. (The basic idea behind EM.)

General Formulation • EM maximizes incomplete log likelihood Jensen’s Inequality Auxiliary Function

General Formulation • Given , maximizing is equal to maximizing the expected complete log likelihood

General Formulation • Given , the choice yields the maximum of . Note:is the upper bound of

General Formulation • From above, at every step of EM, we maximized . • However, how do we know whether the finally maximized also maximized incomplete log likelihood ?

General Formulation • The different between and non-negative and uniquely minimized at KL divergence

General Formulation • EM and alternating minimization • Recall the maximization of the likelihood is exactly the same as minimization of KL divergence between the empirical distribution and the model. • Including the latent variable , KL divergence comes to be a “complete KL divergence” between joint distributions on .

General Formulation • Reformulated EM algorithm • (E step) • (M step) Alternating minimization algorithm

Summary • Unconditional Mixture • Graphic model • EM algorithm • Conditional Mixture • Graphic model • EM algorithm • A general formulation of EM algorithm • Maximizing auxiliary function • Minimizing “complete KL divergence”

Incomplete Graphical Models Thank You!

Incomplete Graphical Models

Incomplete Graphical Models

Presentation Transcript

Graphical Models

Graphical Models

Graphical Models

Graphical Models

Graphical Models - Inference -

GRAPHICAL MODELS

Probabilistic graphical models

Probabilistic Graphical Models

Directed Graphical Probabilistic Models:

Probabilistic Graphical Models

Compiling Graphical Models

Graphical Models

Graphical Multiagent Models

Graphical Causal Models

Undirected Graphical Models

Probabilistic Graphical Models

Part II: Graphical models

Integration and Graphical Models

Graphical Models