310 likes | 443 Views
Expectation Propagation and Generalized EP Methods for Inference in Switching LDSs Onno Zoeter & Tom Heskes. Bayesian Time Series Models Seminar 2012.07.19 Summarized and Presented by Heo , Min-Oh. Contents. Basic Model: SLDS Motivation: Complexity for Posteriors Methods
E N D
Expectation Propagation and Generalized EP Methods for Inference in Switching LDSsOnnoZoeter & Tom Heskes Bayesian Time Series Models Seminar 2012.07.19 Summarized and Presented by Heo, Min-Oh
Contents • Basic Model: SLDS • Motivation: Complexity for Posteriors • Methods • Assumed Density Filtering • cf) Clique Tree Inference in HMM • EP in a Nutshell • Expectation propagation for Smoothing in SLDS • Generalized EP • Experiments • Appendix: Canonical form & the corresponding operations (c)2012, Biointelligence Lab, http://bi.snu.ac.kr
Model • Switching linear dynamical system • conditionally Gaussian state space model • Switching Kalman filter model • Hybrid model Switch part Ellipse: Gaussian Rectangle: multinomial Shading: observed Transition model Observation model (c)2012, Biointelligence Lab, http://bi.snu.ac.kr
Complexity for Posteriors • Posterior distribution for Filtering problems • should consider all possible sequences of S1:T • P(Xt| St, y1:T, ) = MT-1Gaussian mixture (c)2012, Biointelligence Lab, http://bi.snu.ac.kr
Complexity for Posteriors • Ex) • For i=2, if we consider P(X1, X2), • # of Gaussian components in P(X2) without approximation: 4 • Then, P(Xi) is a mixture of 2i Gaussians. Representing the correct marginal dist. in a hybrid network can require space that is exponential in the size of network Exact inference in CLG networks including standard discrete networks is NP-hard (even in polytrees). Even the problem of computing the prob. of a single discrete vairable is NP-hard (c)2012, Biointelligence Lab, http://bi.snu.ac.kr
Exact Inference: Filtering as Clique Tree Propagation • Recursive filtering process as message passing in a clique tree with belief state HMM case: Forward pass in a sum-product clique tree alg. (c)2012, Biointelligence Lab, http://bi.snu.ac.kr
Assumed Density Filtering (ADF) • ADF forces the belief state to live in some restricted family F, e.g., product of histograms, Gaussian. • Given a prior , do one step of exact Bayesian updating to get . Then do a projection step to find the closest approximation in the family: • If F is the exponential family, we can solve the KL minimization by moment matching. (c)2012, Biointelligence Lab, http://bi.snu.ac.kr
Assumed Density Filtering (ADF) • Minimizing KL(p||q) with respect to exponential family q(z), • By setting gradient w.r.t. η to zero, • For general exponential family, the following holds • So, • Moment matching • The optimum solution is to match the expected sufficient statistics from the derivative of Notation in this Chapter: (c)2012, Biointelligence Lab, http://bi.snu.ac.kr
Forward Message: Potentials in Sum-product alg. Approximating forward pass message for Filtering only! (c)2012, Biointelligence Lab, http://bi.snu.ac.kr
One Example for DBN (c)2012, Biointelligence Lab, http://bi.snu.ac.kr
EP in a nutshell • Approximate a function by a simpler one: • Where each lives in a parametric, exponential family (e.g. Gaussian) • Factors can be conditional distributions in a Bayesian network
EP algorithm • Iterate the fixed-point equations: • specifies where the approximation needs to be good • Coordinated local approximations where
(Loopy) Belief propagation • Specialize to factorized approximations: • Minimize KL-divergence = match marginals of (partially factorized) and (fully factorized) • “send messages” “messages”
EP versus BP • EP approximation can be in a restricted family, e.g. Gaussian • EP approximation does not have to be factorized • EP applies to many more problems • e.g. mixture of discrete/continuous variables
Expectation Propagation • EP approximate smoothing algorithm • Smoother is backward (smoothing) version based on the assumed density filter • Considering forward & Backward pass together • In exact case: (backward message) • In approximation: (c)2012, Biointelligence Lab, http://bi.snu.ac.kr
Expectation Propagation (c)2012, Biointelligence Lab, http://bi.snu.ac.kr
Convergence in EP for SLDS • Sometimes approximation may fail • How to resolve • Iteration • Step 1 to 4 in ADF iteration • to Find local approximation that are consistent as possible • Use damped messages • Normalisablility in step 4 in ADF is guaranteed if the sum of the respective inverse covariance matrices is positive definite • Damped message in canonical space (appendix) (c)2012, Biointelligence Lab, http://bi.snu.ac.kr
Generalized EP • More accurate approximation similar to Kikuchi’s extension of the Bethe free-energy • Outer cluster • Larger than cliques of junction tree • Overlap (c)2012, Biointelligence Lab, http://bi.snu.ac.kr
K=1 case: • Clusters form the cliques and separators in a junction tree Outer cluster: Counting number: 1 Counting number: -1(1-2 = -1) Overlaps: Counting number: 0 (1- (3-2) = 0) Counting number: 0 (1- (4-3+0) =0) (c)2012, Biointelligence Lab, http://bi.snu.ac.kr
Alternative Backward Pass (ABP) • Approximation to smoothed posteriors • Based on Traditional Kalman smoother form • Treat discrete and continuous latent states separately (c)2012, Biointelligence Lab, http://bi.snu.ac.kr
Experiments – with exact posteriors • 100 Models: generated by drawing parameters from conjugate priors • Dataset: Generated a sequence of length 8 (c)2012, Biointelligence Lab, http://bi.snu.ac.kr
Experiments – with exact posteriors, Gibbs sampling (c)2012, Biointelligence Lab, http://bi.snu.ac.kr
Experiments – Effect of larger outer clusters (c)2012, Biointelligence Lab, http://bi.snu.ac.kr
Appendix (c)2012, Biointelligence Lab, http://bi.snu.ac.kr
Canonical form • Represents the intermediate result as a log-quadratic form exp(Q(x)) (c)2012, Biointelligence Lab, http://bi.snu.ac.kr
Operation on canonical form (1/4) • Multiplication • Product of two canonical form factors • Ex) (c)2012, Biointelligence Lab, http://bi.snu.ac.kr
Operation on canonical form (2/4) • Division • Vacuous canonical form • Causes no effect for multiplication and division • Defined as (c)2012, Biointelligence Lab, http://bi.snu.ac.kr
Operation on canonical form (3/4) • Marginalization • The integral is finite iff KYY is positive definite (c)2012, Biointelligence Lab, http://bi.snu.ac.kr
Operation on canonical form (4/4) • Reduction • Reduce a canonical form to context representing evidence • If Y=y, (c)2012, Biointelligence Lab, http://bi.snu.ac.kr
Sum-product algorithms • Inference in linear Gaussian networks • able to adapt variable elimination and clique tree algorithms using canonical forms • Marginalization operation is well-defined for an arbitrary canonical form • Reduction for instantiating continuous variable • cf) discrete case: simply zeroing the entries that are not consistent with Z = z • Computational complexity • Linear in the number of cliques • At most cubic in the size of the largest clique (c)2012, Biointelligence Lab, http://bi.snu.ac.kr