Knowledge Repn. & Reasoning Lec #24: Approximate Inference in DBNs

Knowledge Repn. & ReasoningLec #24: Approximate Inference in DBNs UIUC CS 498: Section EA Professor: Eyal AmirFall Semester 2004 (Some slides by X. Boyen & D. Koller, and by S. H. Lim; Some slides by Doucet, de Freitas, Murphy, Russell, and H. Zhou)

Dynamic Systems • Filtering in stochastic, dynamic systems: • Monitoring freeway traffic (from an autonomous driver or for traffic analysis) • Monitoring patient’s symptoms • Models to deal with uncertainty and/or partial observability in dynamic systems: • Hidden Markov Models (HMMs), Kalman Filters etc • All are special cases of Dynamic Bayesian Networks (DBNs)

Previously • Exact DBN inference • Filtering • Smoothing • Projection • Explanation

DBN Myth • Bayesian Network: a decomposed structure to represent the full joint distribution • Does it imply easy decomposition for the belief state? • No!

Tractable, approximate representation • Exact inference in DBN is intractable • Need approximation • Maintain an approximate belief state • E.g. assume Gaussian processes • Today: • Factored belief state apx [Boyen & Koller ’98] • Particle filtering (if time permits)

Idea • Use a decomposable representation for the belief state (pre-assume some independency)

Problem • What about the approximation errors? • It might accumulate and grow unbounded…

Contraction property • Main result: • If the process is mixing, then every state transition results in a contraction of the distance between the two distributions by a constant factor • Since approximation errors from previous steps decrease exponentially, the overall error remains bounded indefinitely

Basic framework • Definition 1: • Prior belief state: • Posterior belief state: • Monitoring task:

Simple contraction • Distance measure: • Relative entropy (KL-divergence) between the actual and the approximate belief state • Contraction due to O: • Contraction due to T (can we do better?):

Simple contraction (cont) • Definition: • Minimal mixing rate: • Theorem 3 (the single process contraction theorem): • For process Q, anterior distributions φ and ψ, ulterior distributions φ’ and ψ’,

Simple contraction (cont) • Proof Intuition:

Compound processes • Mixing rate could be very small for large processes • The trick is to assume some independence among subprocesses and factor the DBN along these subprocesses • Fully independent subprocesses: • Theorem 5: • For L independent subprocesses T1, …, TL. Let γl be the mixing rate for Tl and let γ = minlγl. Let φ and ψ be distributions over S1(t), …, SL(t), and assume that ψ renders the Sl(t) marginally independent. Then:

Compound processes (cont) • Conditionally independent subprocesses • Theorem 6 (the main theorem): • For L independent subprocesses T1, …, TL, assume each process depends on at most r others, and each influences at most q others. Let γl be the mixing rate for Tl and let γ = minlγl. Let φ and ψ be distributions over S1(t), …, SL(t), and assume that ψ renders the Sl(t) marginally independent. Then:

Efficient, approximate monitoring • If each approximation incurs an error bounded by ε, then • Total error • =>error remains bounded • Conditioning on observations might introduce momentary errors, but the expected error will contract

Approximate DBN monitoring • Algorithm (based on standard clique tree inference): • Construct a clique tree from the 2-TBN • Initialize clique tree with conditional probabilities from CPTs of the DBN • For each time step: • Create a working copy of the tree Y. Create σ(t+1). • For each subprocess l, incorporate the marginal σ(t)[X(t)l] in the appropriate factor in Y. • Incorporate evidence r(t+1) in Y. • Calibrate the potentials in Y. • For each l, query Y for marginal over Xl(t+1) and store it in σ(t+1).

Conclusion of Factored DBNs • Accuracy-efficiency tradeoff: • Small partition => • Faster inference • Better contraction • Worse approximation • Key to good approximation: • Discover weak/sparse interactions among subprocesses and factor the DBN along these lines • Domain knowledge helps

Agenda • Factored inference in DBNs • Sampling: Particle Filtering

A sneak peek at particle filtering

Introduction • Analytical methods • Kalman filter: linear-Gaussian models • HMM: models with finite state space • Stat. approx. methods for non-parametric distributions and large discrete DBN • Diff. names: • Sequential Monte Carlo (Handschin and Mayne 1969, Akashi and Kumamoto 1975) • Particle filtering (Doucet et all 1997) • Survival of the fittest (Kanazawa, Koller and Russell 1995) • Condensation in computer vision (Isard and Blake 1996)

Outline • Importance Sampling (IS) revisited • Sequential IS (SIS) • Particle Filtering = SIS + Resampling • Dynamic Bayesian Networks • A Simple example: ABC network • Inference in DBN: • Exact inference • Pure Particle Filtering • Rao-Blackwellised PF • Demonstration in ABC network • Discussions

Importance Sampling Revisited • Goal: evaluate the following functional • Importance Sampling (batch mode): • Sample from • Assign as weight of each sample • The posterior estimation of is:

Sequential Importance Sampling • How to make it sequential? • Choose Importance function to be: • We get the SIS filter • Benefit of SIS • Observation ykdon’t have be given in batch

Sequential Importance Sampling

Resampling • Why need to resample • Degeneracy of SIS • The variance of the importance weights (y0:k is r.v.) increases in each recursion step • Optimal importance function • Need to sample from and evaluate • Resampling: eliminate small weights and concentrate on large weights

Resampling • Measure of degeneracy: effective sample size

Resampling Step Particle filtering = SIS + Resampling

Rao-Blackwellisation for SIS • A method to reduce the variance of the final posterior estimation • Useful when the state can be partitioned as in which can be analytically marginalized. • Assuming can be evaluated analytically given , one can rewrite the posterior estimation as

Example: ABC network

Inference in DBN n

Exact inference in ABC network

Particle filtering

Rao-Blackwellised PF

Rao-Blackwellised PF (2)

Discussions • Structure of the network: • A, C dependent on B • yt can be also separated into 3 indep. parts

Knowledge Repn. & Reasoning Lec #24: Approximate Inference in DBNs