1 / 33

CS498-EA Reasoning in AI Lecture #18

CS498-EA Reasoning in AI Lecture #18. Instructor: Eyal Amir Fall Semester 2009. Last Time. Time and uncertainty Inference: filtering, prediction, smoothing Hidden Markov Models (HMMs) Model Exact Reasoning Dynamic Bayesian Networks Model Exact Reasoning. Inference Tasks. Filtering:

bona
Download Presentation

CS498-EA Reasoning in AI Lecture #18

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS498-EAReasoning in AILecture #18 Instructor: Eyal Amir Fall Semester 2009

  2. Last Time • Time and uncertainty • Inference: filtering, prediction, smoothing • Hidden Markov Models (HMMs) • Model • Exact Reasoning • Dynamic Bayesian Networks • Model • Exact Reasoning

  3. Inference Tasks • Filtering: • Belief state: probability of state given the evidence • Prediction: • Like filtering without evidence • Smoothing: • Better estimate of past states • Most likelihood explanation: • Scenario that explains the evidence

  4. Filtering (forward algorithm) Xt+1 Xt Xt-1 Et-1 Et+1 Update: Et Predict: Recursive step

  5. Smoothing Forward backward

  6. Most Likely Explanation • Finding most likely path Xt+1 Xt Xt-1 Et-1 Et+1 Et Called Viterbi

  7. Today • Dynamic Bayesian Networks • Exact Inference • Approximate Inference

  8. Dynamic Bayesian Network Time 0 Time 1 Standard BN Standard BN • DBN is like a 2time-BN • Using the first order Markov assumptions

  9. Dynamic Bayesian Network • Basic idea: • Copy state and evidence for each time step • Xt: set of unobservable (hidden) variables (e.g.: Pos, Vel) • Et: set of observable (evidence) variables (e.g.: Sens.A, Sens.B) • Notice: Time is discrete

  10. Example

  11. Inference in DBN Unroll: Inference in the above BN Not efficient (depends on the sequence length)

  12. Exact Inference in DBNs No conditional independence after few steps x1(0) x1(1) x1(2) x1(3) X2(0) X2(1) X2(2) X2(3) X3(0) X3(1) X3(2) X3(3) X4(0) X4(1) X4(2) X4(3) • Variable Elimination: • Add slice t+1, sum out slice t using variable elimination

  13. Exact Inference in DBNs s1 s1 s1 s1 s2 s2 s2 s2 s3 s3 s3 s3 s4 s4 s4 s4 s5 s5 s5 s5 • Variable Elimination: • Add slice t+1, sum out slice t using variable elimination

  14. Variable Elimination s2 s3 s4 s5 s2 s3 s4 s5 s2 s3 s4 s5 s2 s3 s4 s5

  15. Variable Elimination s3 s4 s5 s3 s4 s5 s3 s4 s5 s3 s4 s5

  16. Variable Elimination s4 s5 s4 s5 s4 s5 s4 s5

  17. DBN Representation: DelC T T(t+1) T(t+1) T 0.91 0.09 F 0.0 1.0 RHM R(t+1) R(t+1) T 1.0 0.0 F 0.0 1.0 RHMt RHMt+1 fRHM(RHMt,RHMt+1) Mt Mt+1 fT(Tt,Tt+1) Tt Tt+1 L CR RHC CR(t+1) CR(t+1) O T T 0.2 0.8 E T T 1.0 0.0 O F T 0.0 1.0 E F T 0.0 1.0 O T F 1.0 0.1 E T F 1.0 0.0 O F F 0.0 1.0 E F F 0.0 1.0 Lt Lt+1 CRt CRt+1 RHCt RHCt+1 fCR(Lt,CRt,RHCt,CRt+1)

  18. Benefits of DBN Representation RHMt RHMt+1 Mt Mt+1 Tt Tt+1 s1 s2 ... s160 Lt Lt+1 s1 0.9 0.05 ... 0.0 s2 0.0 0.20 ... 0.1 . . . s160 0.1 0.0 ... 0.0 CRt CRt+1 RHCt RHCt+1 Pr(Rmt+1,Mt+1,Tt+1,Lt+1,Ct+1,Rct+1 | Rmt,Mt,Tt,Lt,Ct,Rct) = fRm(Rmt,Rmt+1) * fM(Mt,Mt+1) * fT(Tt,Tt+1) * fL(Lt,Lt+1) * fCr(Lt,Crt,Rct,Crt+1) * fRc(Rct,Rct+1) • Only few parameters vs. • 25440 for matrix • Removes global exponential • dependence

  19. DBN Myth • Bayesian Network: a decomposed structure to represent the full joint distribution • Does it imply easy decomposition for the belief state? • No!

  20. Tractable, approximate representation • Exact inference in DBN is intractable • Need approximation • Maintain an approximate belief state • E.g. assume Gaussian processes • Boyen-Koller approximation: • Factored belief state

  21. Idea • Use a decomposable representation for the belief state (pre-assume some independency)

  22. Problem • What about the approximation errors? • It might accumulate and grow unbounded…

  23. Contraction property • Main properties of B-K approximation: • Under reasonable assumptions about the stochasticity of the process, every state transition results in a contraction of the distance between the two distributions by a constant factor • Since approximation errors from previous steps decrease exponentially, the overall error remains bounded indefinitely

  24. Basic framework • Definition 1: • Prior belief state: • Posterior belief state: • Monitoring task:

  25. Simple contraction • Distance measure: • Relative entropy (KL-divergence) between the actual and the approximate belief state • Contraction due to O: • Contraction due to T (can we do better?):

  26. Simple contraction (cont) • Definition: • Minimal mixing rate: • Theorem 3 (the single process contraction theorem): • For process Q, anterior distributions φ and ψ, ulterior distributions φ’ and ψ’,

  27. Simple contraction (cont) • Proof Intuition:

  28. Compound processes • Mixing rate could be very small for large processes • The trick is to assume some independence among subprocesses and factor the DBN along these subprocesses • Fully independent subprocesses: • Theorem 5 of [BK98]: • For L independent subprocesses T1, …, TL. Let γl be the mixing rate for Tl and let γ = minlγl. Let φ and ψ be distributions over S1(t), …, SL(t), and assume that ψ renders the Sl(t) marginally independent. Then:

  29. Compound processes (cont) • Conditionally independent subprocesses • Theorem 6 of [BK98]: • For L independent subprocesses T1, …, TL, assume each process depends on at most r others, and each influences at most q others. Let γl be the mixing rate for Tl and let γ = minlγl. Let φ and ψ be distributions over S1(t), …, SL(t), and assume that ψ renders the Sl(t) marginally independent. Then:

  30. Efficient, approximate monitoring • If each approximation incurs an error bounded by ε, then • Total error • =>error remains bounded • Conditioning on observations might introduce momentary errors, but the expected error will contract

  31. Approximate DBN monitoring • Algorithm (based on standard clique tree inference): • Construct a clique tree from the 2-TBN • Initialize clique tree with conditional probabilities from CPTs of the DBN • For each time step: • Create a working copy of the tree Y. Create σ(t+1). • For each subprocess l, incorporate the marginal σ(t)[X(t)l] in the appropriate factor in Y. • Incorporate evidence r(t+1) in Y. • Calibrate the potentials in Y. • For each l, query Y for marginal over Xl(t+1) and store it in σ(t+1).

  32. Solution: BK algorithm Break into smaller clusters Approximation/ marginalization step Exact step With mixing, bounded projection error: total error is bounded

  33. Boyen-Koller Approximation • Example of variational inference with DBNs • Computer posterior for time t from (factored) state estimate at time t-1 • Assume posterior has factored form • Error is bounded

More Related