Introduction to Graphical Models

Introduction to Graphical Models Brookes Vision Lab Reading Group

Graphical Models • To build a complex system using simpler parts. • System should be consistent • Parts are combined using probability • Undirected – Markov random fields • Directed – Bayesian Networks

Overview • Representation • Inference • Linear Gaussian Models • Approximate inference • Learning

Representation Causality : Sprinkler “causes” wet grass

Conditional Independence • Independent of ancestors given parents • P(C,S,R,W) = P(C) P(S|C) P(R|C,S) P(W|C,S,R) • = P(C) P(S|C) P(R|C) P(W|S,R) • Space required for n binary nodes • O(2n) without factorization • O(n2k) with factorization, k = maximum fan-in

Inference • Pr(S=1|W=1) = Pr(S=1,W=1)/Pr(W=1) = 0.2781/0.6471 = 0.430 • Pr(R=1|W=1) = Pr(R=1,W=1)/Pr(W=1) = 0.4581/0.6471 = 0.708

Explaining Away • S and R “compete” to explain W=1 • S and R are conditionally dependent • Pr(S=1|R=1,W=1) = 0.1945

Inference where where

Inference • Variable elimination • Choosing optimal ordering – NP hard • Greedy methods work well • Computing several marginals • Dynamic programming avoids redundant computation • Sound familiar ??

Bayes Balls for Conditional Independence

A Unifying (Re)View FA SPCA PCA LDS Continuous-State LGM Linear Gaussian Model (LGM) Basic Model Mixture of Gaussians VQ HMM Discrete-State LGM

Basic Model • State of a system is a k-vector x (unobserved) • Output of a system is a p-vector y (observed) • Often k << p • Basic model • xt+1 = A xt + w • yt = C xt + v • A is the k x k transition matrix • C is a p x k observation matrix • w = N(0, Q) • v = N(0, R) • Noise processes are essential Zero mean w.l.o.g

Degeneracy in Basic Model • Structure in Q can be moved to A and C • W.l.o.g. Q = I • R cannot be restricted as yt are observed • Components of x can be reordered arbitrarily. • Ordering is based on norms of columns of C. • x1 = N(µ1, Q1) • A and C are assumed to have rank k. • Q, R, Q1 are assumed to be full rank.

Probability Computation • P( xt+1 | xt ) = N(A xt, Q ; xt+1) • P( yt | xt ) = N( C xt, R; yt) • P({x1,..,xT,{y1,..,yT}) = P(x1) П P(xt+1|xtП P(yt|xt) • Negative log probability

Inference • Given model parameters {A, C, Q, R, µ1, Q1} • Given observations y • What can be infered about hidden states x ? • Total likelihood • Filtering : P (x(t) | {y(1), ... , y(t)}) • Smoothing: P (x(t) | {y(1), ... , y(T)}) • Partial smoothing: P (x(t) | {y(1), ... , y(t+t')}) • Partial prediction: P (x(t) | {y(1), ... , y(t-t')}) • Intermediate values of recursive methods for computing total likelihood.

Learning • Unknown parameters {A, C, Q, R, µ1, Q1} • Given observations y • Log-likelihood F(Q,Ө) – free energy

EM algorithm • Alternate between maximizing F(Q,Ө) w.r.t. Q and Ө. • F = L at the beginning of M-step • E-step does not change Ө • Therefore, likelihood does not decrease.

Continuous-State LGM Continuous-State LGM Static Data Modeling Time-series Modeling • No temporal dependence • Factor analysis • SPCA • PCA • Time ordering of data crucial • LDS (Kalman filter models)

Static Data Modelling • A = 0 • x = w • y =C x + v • x1 = N(0,Q) • y = N(0, CQC'+R) • Degeneracy in model • Learning : EM • R restricted • Inference

Factor Analysis • Restrict R to be diagonal. • Q = I • x – factors • C – factor loading matrix • R – uniqueness • Learning – EM , quasi-Newton optimization • Inference

SPCA • R = єI • є – global noise level • Columns of C span the principal subspace. • Learning – EM algorithm • Inference

PCA • R = lim є->0 єI • Learning • Diagonalize sample covariance of data • Leading k eigenvalues and eigenvectors define C • EM determines leading eigenvectors without diagonalization • Inference • Noise becomes infinitesimal • Posterior collapses to a single point

Linear Dynamical Systems • Inference – Kalman filter • Smoothing – RTS recursions • Learning – EM algorithm • C known – Shumway and Stoffer, 1982 • All unknown – Ghahramani and Hinton, 1995

Discrete-State LGM • xt+1 = WTA[A xt + w] • yt = C xt + v • x1 = WTA[N(µ1,Q1)]

Discrete-State LGM Discrete-state LGM Static Data Modeling Time-series Modeling • Mixture of Gaussians • VQ • HMM

Static Data Modelling • A = 0 • x = WTA[w] • w = N(µ,Q) • Y = C x + v • лj = P(x = ej) • Nonzero µ for nonuniform лj • y = N(Cj, R) • Cj – jth column of C

Mixture of Gaussians • Mixing coefficients of cluster лj • Mean – columns Cj • Variance – R • Learning: EM (corresponds to ML competitive learning) • Inference

Vector Quantization • Observation noise becomes infinitesimal • Inference problem solved by 1NN rule • Euclidean distance for diagonal R • Mahalanobis distance for unscaled R • Posterior collapses to closest cluster • Learning with EM = batch version of k-means

Time-series modelling

HMM • Transition matrix T • Ti,j = P(xt+1 = ej | xt = ei) • For every T, there exist A and Q • Filtering : forward recursions • Smoothing: forward-backward algorithm • Learning: EM (called Baum-Welsh reestimation) • MAP state sequences - Viterbi

Introduction to Graphical Models

Introduction to Graphical Models

Presentation Transcript

Graphical Models

Introduction to Graphical Models Part 2 of 2

Incomplete Graphical Models

Graphical Models

Graphical Models

Introduction to probability theory and graphical models

Graphical Models

A Brief Introduction to Graphical Models

Graphical Models - Inference -

GRAPHICAL MODELS

An Introduction to Variational Methods for Graphical Models

Probabilistic Graphical Models

Graphical Models

An Introduction to Variational Methods for Graphical Models

Graphical Multiagent Models

Graphical Causal Models

Probabilistic Graphical Models

An Introduction to Variational Methods for Graphical Models

Graphical Models