1 / 48

CS 541: Artificial Intelligence

CS 541: Artificial Intelligence. Lecture VIII: Temporal Probability Models. Re-cap of Lecture VII. Exact inference by variable elimination: polytime on polytrees , NP-hard on general graphs space = time, very sensitive to topology Approximate inference by LW, MCMC:

wilson
Download Presentation

CS 541: Artificial Intelligence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 541: Artificial Intelligence Lecture VIII: Temporal Probability Models

  2. Re-cap of Lecture VII • Exact inference by variable elimination: • polytime on polytrees, NP-hard on general graphs • space = time, very sensitive to topology • Approximate inference by LW, MCMC: • LW does poorly when there is lots of (downstream) evidence • LW, MCMC generally insensitive to topology • Convergence can be very slow with probabilities close to 1 or 0 • Can handle arbitrary combinations of discrete and continuous variables

  3. Re-cap of Lecture IX • Machine Learning • Classification (Naïve Bayes) • Decision trees • Regression (Linear, Smoothing) • Linear Separation (Perceptron, SVMs) • Non-parametric classification (KNN)

  4. Outline • Time and uncertainty • Inference: filtering, prediction, smoothing • Hidden Markov models • Kalman filters (a brief mention) • Dynamic Bayesian networks • Particle filtering

  5. Time and uncertainty

  6. Time and uncertainty • The world changes: we need to track and predict it • Diabetes management vs vehiclediagnosis • Basic idea: copy state and evidence variables for each time step • = set of unobservable state variables at time t • E.g., BloodSugart, StomachContentst, etc. • = set of observable evidence variables at time t • E.g., MeasuredBloodSugart, PulseRatet, FoodEatent • This assumes discrete time; step size depends on problem • Notation:

  7. Markov processes (Markov chain) • Construct a Bayesian net from these variables: parents? • Markov assumption: depends on bounded set of • First order Markov process: • Second order Markov process: • Sensor Markov assumption: • Stationary process: transition model and sensor model are fixed for all t

  8. Example • First-order Markov assumption not exactly true in real world! • Possible fixes: • 1. Increase order of Markov process • 2. Augment state, e.g., add Tempt, Pressuret • Example: robot motion. • Augment position and velocity with Batteryt

  9. Inference

  10. Inference tasks • Filtering: P(Xt|e1:t) • Belief state--input to the decision process of a rational agent • Prediction: P(Xt+k|e1:t) for k > 0 • Evaluation of possible action sequences; • Like filtering without the evidence • Smoothing: P(Xk|e1:t) for 0 ≤ k < t • Better estimate of past states, essential for learning • Most likely explanation: • Speech recognition, decoding with a noisy channel

  11. Filtering • Aim: devise a recursive state estimation algorithm • I.e., prediction + estimation. Prediction by summing out Xt: • where • Time and space constant ( independent of t )

  12. Filtering example

  13. Smoothing • Divide evidence e1:t into e1:k, ek+1:t : • Backward message computed by a backwards recursion:

  14. Smoothing example • Forward-backward algorithm: cache forward messages along the way • Time linear in t (polytree inference), space O(t|f|)

  15. Most likely explanation • Most likely sequence ≠ sequence of most likely states!!!! • Most likely path to each xt+1 • =most likely path to some xt plus one more step • Identical to filtering, except f1:t replaced by • I.e., m1:i(t) gives the probability of the most likely path to state i • Update has sum replaced by max, giving the Viterbi algorithm:

  16. Viterbi algorithm

  17. Hidden Markov models

  18. Hidden Markov models • Xtis a single, discrete variable (usually Et is too) • Domain of Xt is {1; ...; S} • Transition matrix Tij = P(Xt =j|Xt-1 =i) e.g., • Sensor matrix Ot for each time step, diagonal elements P(et|Xt =i) • e.g., with U1 =true, • Forward and backward messages as column vectors: • Forward-backward algorithm needs time O(S2t) and space O(St)

  19. Country dance algorithm • Can avoid storing all forward messages in smoothing by running forward algorithm backwards: • Algorithm: forward pass computes ft, backward pass does fi, bi

  20. Country dance algorithm • Can avoid storing all forward messages in smoothing by running forward algorithm backwards: • Algorithm: forward pass computes ft, backward pass does fi, bi

  21. Country dance algorithm • Can avoid storing all forward messages in smoothing by running forward algorithm backwards: • Algorithm: forward pass computes ft, backward pass does fi, bi

  22. Country dance algorithm • Can avoid storing all forward messages in smoothing by running forward algorithm backwards: • Algorithm: forward pass computes ft, backward pass does fi, bi

  23. Country dance algorithm • Can avoid storing all forward messages in smoothing by running forward algorithm backwards: • Algorithm: forward pass computes ft, backward pass does fi, bi

  24. Country dance algorithm • Can avoid storing all forward messages in smoothing by running forward algorithm backwards: • Algorithm: forward pass computes ft, backward pass does fi, bi

  25. Country dance algorithm • Can avoid storing all forward messages in smoothing by running forward algorithm backwards: • Algorithm: forward pass computes ft, backward pass does fi, bi

  26. Country dance algorithm • Can avoid storing all forward messages in smoothing by running forward algorithm backwards: • Algorithm: forward pass computes ft, backward pass does fi, bi

  27. Country dance algorithm • Can avoid storing all forward messages in smoothing by running forward algorithm backwards: • Algorithm: forward pass computes ft, backward pass does fi, bi

  28. Country dance algorithm • Can avoid storing all forward messages in smoothing by running forward algorithm backwards: • Algorithm: forward pass computes ft, backward pass does fi, bi

  29. Kalman filtering

  30. Inference by stochastic simulation • Modelling systems described by a set of continuous variables, • E.g., tracking a bird flying: • Airplanes, robots, ecosystems, economies, chemical plants, planets, … • Gaussian prior, linear Gaussian transition model and sensor model

  31. Updating Gaussian distributions • Prediction step: if P(Xt|e1:t) is Gaussian, then prediction is Gaussian. If P(Xt+1|e1:t) is Gaussian, then the updated distribution is Gaussian • HenceP(Xt|e1:t)is multi-variateGaussian for all t • General (nonlinear, non-Gaussian) process: description of posterior grows unboundedly as

  32. Simple 1-D example • Gaussian random walk on X-axis, s.d. , sensor s.d.

  33. General Kalman update • Transition and sensor models • F is the matrix for the transition; the transition noise covariance • H is the matrix for the sensors; the sensor noise covariance • Filter computes the following update: • where is the Kalman gain matrix • and are independent of observation sequence, so compute offline

  34. 2-D tracking example: filtering

  35. 2-D tracking example: smoothing

  36. Where is breaks • Cannot be applied if the transition model is nonlinear • Extended Kalman Filter models transition as locally linear around • Fails if systems is locally unsmooth

  37. Dynamic Bayesian networks

  38. Dynamic Bayesian networks • Xt, Et contain arbitrarily many variables in a replicated Bayes net

  39. DBNs vs. HMMs • Every HMM is a single-variable DBN; every discrete DBN is an HMM • Sparse dependencies: exponentially fewer parameters; • e.g., 20 state variables, three parents each • DBN has parameters, HMM has

  40. DBN vs. Kalman filters • Every Kalman filter model is a DBN, but few DBNs are KFs; • Real world requires non-Gaussian posteriors • E.g., where are bin Laden and my keys? What's the battery charge?

  41. Exact inference in DBNs • Naive method: unroll the network and run any exact algorithm • Problem: inference cost for each update grows with t • Rollup filtering: add slice t+1, “sum out”slice t using variable elimination • Largest factor is O(dn+1), update cost O(dn+2) (cf. HMM update cost O(d2n))

  42. Likelihood weighting for DBN • Set of weighted samples approximates the belief state • LW samples pay no attention to the evidence! • fraction “agreeing” falls exponentially with t • Number of samples required grows exponentially with t

  43. Particle filtering

  44. Particle filtering • Basic idea: ensure that the population of samples ("particles") tracks the high-likelihood regions of the state-space • Replicate particles proportional to likelihood for et • Widely used for tracking non-linear systems, esp, vision • Also used for simultaneous localization and mapping in mobile robots, 105-dimensional state space

  45. Particle filtering • Assume consistent at time t: • Propagate forward: populations of xt+1 are • Weight samples by their likelihood for et+1 : • Resample to obtain populations proportional to W :

  46. Particle filtering performance • Approximation error of particle filtering remains bounded over time, at least empirically theoretical analysis is difficult

  47. How particle filter works & applications

  48. Summary • Temporal models use state and sensor variables replicated over time • Markov assumptions and stationarity assumption, so we need • Transition model P(Xt|Xt-1) • Sensor model P(Et|Xt) • Tasks are filtering, prediction, smoothing, most likely sequence; • All done recursively with constant cost per time step • Hidden Markov models have a single discrete state variable; • Widely used for speech recognition • Kalman filters allow n state variables, linear Gaussian, O(n3) update • Dynamic Bayesian nets subsume HMMs, Kalman filters; exact update intractable • Particle filtering is a good approximate filtering algorithm for DBNs

More Related