330 likes | 523 Views
CS498-EA Reasoning in AI Lecture #18. Instructor: Eyal Amir Fall Semester 2009. Last Time. Time and uncertainty Inference: filtering, prediction, smoothing Hidden Markov Models (HMMs) Model Exact Reasoning Dynamic Bayesian Networks Model Exact Reasoning. Inference Tasks. Filtering:
E N D
CS498-EAReasoning in AILecture #18 Instructor: Eyal Amir Fall Semester 2009
Last Time • Time and uncertainty • Inference: filtering, prediction, smoothing • Hidden Markov Models (HMMs) • Model • Exact Reasoning • Dynamic Bayesian Networks • Model • Exact Reasoning
Inference Tasks • Filtering: • Belief state: probability of state given the evidence • Prediction: • Like filtering without evidence • Smoothing: • Better estimate of past states • Most likelihood explanation: • Scenario that explains the evidence
Filtering (forward algorithm) Xt+1 Xt Xt-1 Et-1 Et+1 Update: Et Predict: Recursive step
Smoothing Forward backward
Most Likely Explanation • Finding most likely path Xt+1 Xt Xt-1 Et-1 Et+1 Et Called Viterbi
Today • Dynamic Bayesian Networks • Exact Inference • Approximate Inference
Dynamic Bayesian Network Time 0 Time 1 Standard BN Standard BN • DBN is like a 2time-BN • Using the first order Markov assumptions
Dynamic Bayesian Network • Basic idea: • Copy state and evidence for each time step • Xt: set of unobservable (hidden) variables (e.g.: Pos, Vel) • Et: set of observable (evidence) variables (e.g.: Sens.A, Sens.B) • Notice: Time is discrete
Inference in DBN Unroll: Inference in the above BN Not efficient (depends on the sequence length)
Exact Inference in DBNs No conditional independence after few steps x1(0) x1(1) x1(2) x1(3) X2(0) X2(1) X2(2) X2(3) X3(0) X3(1) X3(2) X3(3) X4(0) X4(1) X4(2) X4(3) • Variable Elimination: • Add slice t+1, sum out slice t using variable elimination
Exact Inference in DBNs s1 s1 s1 s1 s2 s2 s2 s2 s3 s3 s3 s3 s4 s4 s4 s4 s5 s5 s5 s5 • Variable Elimination: • Add slice t+1, sum out slice t using variable elimination
Variable Elimination s2 s3 s4 s5 s2 s3 s4 s5 s2 s3 s4 s5 s2 s3 s4 s5
Variable Elimination s3 s4 s5 s3 s4 s5 s3 s4 s5 s3 s4 s5
Variable Elimination s4 s5 s4 s5 s4 s5 s4 s5
DBN Representation: DelC T T(t+1) T(t+1) T 0.91 0.09 F 0.0 1.0 RHM R(t+1) R(t+1) T 1.0 0.0 F 0.0 1.0 RHMt RHMt+1 fRHM(RHMt,RHMt+1) Mt Mt+1 fT(Tt,Tt+1) Tt Tt+1 L CR RHC CR(t+1) CR(t+1) O T T 0.2 0.8 E T T 1.0 0.0 O F T 0.0 1.0 E F T 0.0 1.0 O T F 1.0 0.1 E T F 1.0 0.0 O F F 0.0 1.0 E F F 0.0 1.0 Lt Lt+1 CRt CRt+1 RHCt RHCt+1 fCR(Lt,CRt,RHCt,CRt+1)
Benefits of DBN Representation RHMt RHMt+1 Mt Mt+1 Tt Tt+1 s1 s2 ... s160 Lt Lt+1 s1 0.9 0.05 ... 0.0 s2 0.0 0.20 ... 0.1 . . . s160 0.1 0.0 ... 0.0 CRt CRt+1 RHCt RHCt+1 Pr(Rmt+1,Mt+1,Tt+1,Lt+1,Ct+1,Rct+1 | Rmt,Mt,Tt,Lt,Ct,Rct) = fRm(Rmt,Rmt+1) * fM(Mt,Mt+1) * fT(Tt,Tt+1) * fL(Lt,Lt+1) * fCr(Lt,Crt,Rct,Crt+1) * fRc(Rct,Rct+1) • Only few parameters vs. • 25440 for matrix • Removes global exponential • dependence
DBN Myth • Bayesian Network: a decomposed structure to represent the full joint distribution • Does it imply easy decomposition for the belief state? • No!
Tractable, approximate representation • Exact inference in DBN is intractable • Need approximation • Maintain an approximate belief state • E.g. assume Gaussian processes • Boyen-Koller approximation: • Factored belief state
Idea • Use a decomposable representation for the belief state (pre-assume some independency)
Problem • What about the approximation errors? • It might accumulate and grow unbounded…
Contraction property • Main properties of B-K approximation: • Under reasonable assumptions about the stochasticity of the process, every state transition results in a contraction of the distance between the two distributions by a constant factor • Since approximation errors from previous steps decrease exponentially, the overall error remains bounded indefinitely
Basic framework • Definition 1: • Prior belief state: • Posterior belief state: • Monitoring task:
Simple contraction • Distance measure: • Relative entropy (KL-divergence) between the actual and the approximate belief state • Contraction due to O: • Contraction due to T (can we do better?):
Simple contraction (cont) • Definition: • Minimal mixing rate: • Theorem 3 (the single process contraction theorem): • For process Q, anterior distributions φ and ψ, ulterior distributions φ’ and ψ’,
Simple contraction (cont) • Proof Intuition:
Compound processes • Mixing rate could be very small for large processes • The trick is to assume some independence among subprocesses and factor the DBN along these subprocesses • Fully independent subprocesses: • Theorem 5 of [BK98]: • For L independent subprocesses T1, …, TL. Let γl be the mixing rate for Tl and let γ = minlγl. Let φ and ψ be distributions over S1(t), …, SL(t), and assume that ψ renders the Sl(t) marginally independent. Then:
Compound processes (cont) • Conditionally independent subprocesses • Theorem 6 of [BK98]: • For L independent subprocesses T1, …, TL, assume each process depends on at most r others, and each influences at most q others. Let γl be the mixing rate for Tl and let γ = minlγl. Let φ and ψ be distributions over S1(t), …, SL(t), and assume that ψ renders the Sl(t) marginally independent. Then:
Efficient, approximate monitoring • If each approximation incurs an error bounded by ε, then • Total error • =>error remains bounded • Conditioning on observations might introduce momentary errors, but the expected error will contract
Approximate DBN monitoring • Algorithm (based on standard clique tree inference): • Construct a clique tree from the 2-TBN • Initialize clique tree with conditional probabilities from CPTs of the DBN • For each time step: • Create a working copy of the tree Y. Create σ(t+1). • For each subprocess l, incorporate the marginal σ(t)[X(t)l] in the appropriate factor in Y. • Incorporate evidence r(t+1) in Y. • Calibrate the potentials in Y. • For each l, query Y for marginal over Xl(t+1) and store it in σ(t+1).
Solution: BK algorithm Break into smaller clusters Approximation/ marginalization step Exact step With mixing, bounded projection error: total error is bounded
Boyen-Koller Approximation • Example of variational inference with DBNs • Computer posterior for time t from (factored) state estimate at time t-1 • Assume posterior has factored form • Error is bounded