310 likes | 509 Views
Introduction to Graphical Models. Brookes Vision Lab Reading Group. Graphical Models. To build a complex system using simpler parts. System should be consistent Parts are combined using probability Undirected – Markov random fields Directed – Bayesian Networks. Overview. Representation
E N D
Introduction to Graphical Models Brookes Vision Lab Reading Group
Graphical Models • To build a complex system using simpler parts. • System should be consistent • Parts are combined using probability • Undirected – Markov random fields • Directed – Bayesian Networks
Overview • Representation • Inference • Linear Gaussian Models • Approximate inference • Learning
Representation Causality : Sprinkler “causes” wet grass
Conditional Independence • Independent of ancestors given parents • P(C,S,R,W) = P(C) P(S|C) P(R|C,S) P(W|C,S,R) • = P(C) P(S|C) P(R|C) P(W|S,R) • Space required for n binary nodes • O(2n) without factorization • O(n2k) with factorization, k = maximum fan-in
Inference • Pr(S=1|W=1) = Pr(S=1,W=1)/Pr(W=1) = 0.2781/0.6471 = 0.430 • Pr(R=1|W=1) = Pr(R=1,W=1)/Pr(W=1) = 0.4581/0.6471 = 0.708
Explaining Away • S and R “compete” to explain W=1 • S and R are conditionally dependent • Pr(S=1|R=1,W=1) = 0.1945
Inference where where
Inference • Variable elimination • Choosing optimal ordering – NP hard • Greedy methods work well • Computing several marginals • Dynamic programming avoids redundant computation • Sound familiar ??
A Unifying (Re)View FA SPCA PCA LDS Continuous-State LGM Linear Gaussian Model (LGM) Basic Model Mixture of Gaussians VQ HMM Discrete-State LGM
Basic Model • State of a system is a k-vector x (unobserved) • Output of a system is a p-vector y (observed) • Often k << p • Basic model • xt+1 = A xt + w • yt = C xt + v • A is the k x k transition matrix • C is a p x k observation matrix • w = N(0, Q) • v = N(0, R) • Noise processes are essential Zero mean w.l.o.g
Degeneracy in Basic Model • Structure in Q can be moved to A and C • W.l.o.g. Q = I • R cannot be restricted as yt are observed • Components of x can be reordered arbitrarily. • Ordering is based on norms of columns of C. • x1 = N(µ1, Q1) • A and C are assumed to have rank k. • Q, R, Q1 are assumed to be full rank.
Probability Computation • P( xt+1 | xt ) = N(A xt, Q ; xt+1) • P( yt | xt ) = N( C xt, R; yt) • P({x1,..,xT,{y1,..,yT}) = P(x1) П P(xt+1|xtП P(yt|xt) • Negative log probability
Inference • Given model parameters {A, C, Q, R, µ1, Q1} • Given observations y • What can be infered about hidden states x ? • Total likelihood • Filtering : P (x(t) | {y(1), ... , y(t)}) • Smoothing: P (x(t) | {y(1), ... , y(T)}) • Partial smoothing: P (x(t) | {y(1), ... , y(t+t')}) • Partial prediction: P (x(t) | {y(1), ... , y(t-t')}) • Intermediate values of recursive methods for computing total likelihood.
Learning • Unknown parameters {A, C, Q, R, µ1, Q1} • Given observations y • Log-likelihood F(Q,Ө) – free energy
EM algorithm • Alternate between maximizing F(Q,Ө) w.r.t. Q and Ө. • F = L at the beginning of M-step • E-step does not change Ө • Therefore, likelihood does not decrease.
Continuous-State LGM Continuous-State LGM Static Data Modeling Time-series Modeling • No temporal dependence • Factor analysis • SPCA • PCA • Time ordering of data crucial • LDS (Kalman filter models)
Static Data Modelling • A = 0 • x = w • y =C x + v • x1 = N(0,Q) • y = N(0, CQC'+R) • Degeneracy in model • Learning : EM • R restricted • Inference
Factor Analysis • Restrict R to be diagonal. • Q = I • x – factors • C – factor loading matrix • R – uniqueness • Learning – EM , quasi-Newton optimization • Inference
SPCA • R = єI • є – global noise level • Columns of C span the principal subspace. • Learning – EM algorithm • Inference
PCA • R = lim є->0 єI • Learning • Diagonalize sample covariance of data • Leading k eigenvalues and eigenvectors define C • EM determines leading eigenvectors without diagonalization • Inference • Noise becomes infinitesimal • Posterior collapses to a single point
Linear Dynamical Systems • Inference – Kalman filter • Smoothing – RTS recursions • Learning – EM algorithm • C known – Shumway and Stoffer, 1982 • All unknown – Ghahramani and Hinton, 1995
Discrete-State LGM • xt+1 = WTA[A xt + w] • yt = C xt + v • x1 = WTA[N(µ1,Q1)]
Discrete-State LGM Discrete-state LGM Static Data Modeling Time-series Modeling • Mixture of Gaussians • VQ • HMM
Static Data Modelling • A = 0 • x = WTA[w] • w = N(µ,Q) • Y = C x + v • лj = P(x = ej) • Nonzero µ for nonuniform лj • y = N(Cj, R) • Cj – jth column of C
Mixture of Gaussians • Mixing coefficients of cluster лj • Mean – columns Cj • Variance – R • Learning: EM (corresponds to ML competitive learning) • Inference
Vector Quantization • Observation noise becomes infinitesimal • Inference problem solved by 1NN rule • Euclidean distance for diagonal R • Mahalanobis distance for unscaled R • Posterior collapses to closest cluster • Learning with EM = batch version of k-means
HMM • Transition matrix T • Ti,j = P(xt+1 = ej | xt = ei) • For every T, there exist A and Q • Filtering : forward recursions • Smoothing: forward-backward algorithm • Learning: EM (called Baum-Welsh reestimation) • MAP state sequences - Viterbi