15-381 / 681 Instructors: Fei Fang (This Lecture) and Dave Touretzky feifang@cmu

Artificial Intelligence: Representation and Problem SolvingProbabilistic Reasoning (4): Temporal Models 15-381 / 681 Instructors: Fei Fang (This Lecture) and Dave Touretzky feifang@cmu.edu Wean Hall 4126

Recap • Probability Models and Probabilistic Inference • Bayes’ Net • Exact Inference • Approximate inference: sampling • Today: Probabilistic reasoning over time Fei Fang

Outline • Temporal Probability Model • Hidden Markov Model (HMM) • Kalman Filter • Dynamic Bayes’ Net (DBN) • Particle filtering • Applications of DBN Special classes of DBN Fei Fang

Temporal Probabilistic Model • Why do we need temporal probabilistic model? • The world changes over time, and what happens now impacts what will happen in the future • Stock market • Weather • Sometimes world state become clearer as more evidence is collected over time • Diagnosis, e.g., cold vs chronic pharyngitis given coughing • How to model the time? • View the world as time slices: discrete time steps Fei Fang

Temporal Probabilistic Model • State variables (often hidden) • State of the environment • Not directly observable but defines causal dynamics • Evidence variables • Caused by the state of the environment • How to model the example problems? • Stock market • Weather • Diagnosis, e.g., cold vs chronic pharyngitis given coughing Fei Fang

Temporal Probabilistic Model • Transition model: How the world (i.e., state of the environment, ) evolves • Generally • Markov assumption: current state only depend on a finite fixed number of previous states • First-order Markov process: current state only depend on the previous state and not earlier states • Stationary assumption: Markov process or Markov Chain Andrei Andreyevich Markov (1856-1922) Fei Fang

Temporal Probabilistic Model • Sensor model / observation model: How the evidence variables () get their values (assume we get observations starting from ) • Generally • Sensor Markov assumption: only depend on current state • Initial state model: Prior probability distribution at time 0, i.e, • For a first-order Markov process with sensor Markov assumption, full joint probability distribution is Fei Fang

Inference in Temporal Probabilistic Model • Filtering / State estimation: Posterior distribution over current state given all evidence so far, • Prediction: Posterior distribution over future state given all evidence to date, • Smoothing: Posterior distribution of past state given all evidence up to present, • Most likely explanation • Learning: Learn transition and sensor model from observations (not covered) Fei Fang

Outline • Temporal Probability Model • Hidden Markov Model (HMM) • Kalman Filter • Dynamic Bayes’ Net (DBN) • Applications of DBN Special classes of DBN Fei Fang

Hidden Markov Model • HMM • A first-order Markov process that is stationary • Satisfy sensor Markov assumption • Single discrete random variable to represent state (hidden) • Single evidence variables • Specified by , , and Fei Fang

Example: Umbrella • Security guard in a underground installation • Want to infer whether it is raining based on whether your director bring a umbrella • Random Variables: • Hidden variable: , Domain= • Evidence variable: , Domain= Fei Fang

Hidden Markov Model • Matrix representation for (time invariant) • represents • Matrix representation for (depend on evidence) • , • represents If , , then Fei Fang

Inference in HMM: Filtering • Filtering / State estimation: Posterior distribution over current state given all evidence so far, Fei Fang

Inference in HMM: Filtering • Filtering / State estimation: Posterior distribution over current state given all evidence so far, • is given • ? • ? Bayes’ Rule: Product Rule: Sum Rule: Fei Fang

Inference in HMM: Filtering Bayes’ Rule: Product Rule: Sum Rule: Fei Fang

Inference in HMM: Filtering • So given , we can compute according to • Denote by . Since is discrete valued, can be viewed as a vector. The element of is • So using the matrix representation for HMM, we have This is not matrix multiplication (Forward message) This is matrix multiplication Fei Fang

Inference in HMM: Filtering • Filtering / State estimation: Posterior distribution over current state given all evidence so far, • Set • Recursively compute • Return (Forward operation) is determined by Fei Fang

Example: Umbrella • , • Given , • Given , Evidence: , Fei Fang

Quiz 1 • We have computed , if , what do we know about ? • A: • B: • C: Evidence: , Fei Fang

Inference in HMM: Find Most Likely Explanation • Most likely explanation • Viterbi Algorithm (Dynamic-Programming based algorithm) • Applications: decoding in communications, speech recognition, bioinformatics etc Andrew James Viterbi (1935-present) Fei Fang

Viterbi Algorithm • State-time graph: each node represent • Task: Given evidence sequence, find most likely path • Intuition: If most likely path from time to is known, then it is easy to find the most likely path from time to Fei Fang

Viterbi Algorithm • Based on the intuition, is it possible to represent in a recursive manner (i.e., computed from )? Unfortunately, No  • Rewrite We notice can be computed recursively Fei Fang

Viterbi Algorithm • Posterior probability of most likely path with end node can be found by checking the most likely paths with end node Fei Fang

Viterbi Algorithm • Denote as • Since is discrete valued, can be viewed as a vector. The element of is Fei Fang

Viterbi Algorithm • So each node in the state-time graph is associated with a value , which can be computed recursively • How to get the most likely path ? • Highlight the “edge” that leads to the maximum on the state-time graph (Note: Normalization coefficient can be ignored) Recall Fei Fang

Viterbi Algorithm Fei Fang

Quiz 2 • Assume an HMM with two hidden states, and and two observations state and . What’s the most probable state sequence for observation sequence given ? • A: • B: • C: • D: Fei Fang

Kalman Filter • A glimpse for probabilistic modeling with continuous variables • Estimates the internal state of a linear dynamic system from a series of noisy measurements • We will only consider a simple case • State variable (hidden) • Evidence variable (observation) • First-order Markov process • Stationary process • Linear Gaussian distribution • Example: consumer confidence index • Measured by consumer survey Rudolf E. Kálmán (1930-2016) Fei Fang

Kalman Filter • Recall 1-D Gaussian distribution • Mean , variance (standard deviation ) • (pdf) • , , are all Gaussian Fei Fang

Kalman Filter • is also Gaussian • Let and be the mean and variance of , then • Interpretation • is a weighted mean of and . If observation is unreliable ( is large), then is closer to , otherwise closer to • is independent of the observation (see detailed derivation in textbook) Fei Fang

Dynamic Bayesian Networks • A Bayes’ Net that represents a temporal probability model • Any temporal probability model can be represented as a DBN • DBN to represent knowledge of the domain and describe the structure of the problem First-order Markov Chain Second-order Markov Chain Fei Fang

Dynamic Bayesian Networks • For simplicity, here we consider the case where variables and their links are replicated from slice to slice and DBN represents a first-order Markov process that is stationary • Such a DBN is specified by , , and • HMM and Kalman Filters are special cases of DBN • Any discrete-variable DBN can be cast as a HMM • By introducing metavariables • However, use DBN ensures the sparsity of the model Fei Fang

Inference in DBN • Exact inference • “Unroll” the network, apply exact inference techniques directly • Approximate inference • Variate of likelihood weighting (not very efficient) • Particle filtering (commonly used) Fei Fang

Particle Filtering (Not Required) • One step of particle filtering: given samples of , denoted as and evidence , get samples of • Get a set of samples, denoted as , for and associate each sample with a weight • For each sample of in , sample the value of based on and compute the weight as • Resample based on the weight to get a new set of samples for , denoted as • Each new sample is selected from . The probability of sampling is proportional to its weight • Sampled with replacement, i.e., one item can be sampled multiple times Fei Fang

Particle Filtering (Not Required) • Approximate inference using particle filtering for multiple time steps: Apply one-step particle filtering in every time step, recursively update the set of samples • Initialize based on Fei Fang

Example: Umbrella (Not Required) Propagate Weight Resample Fei Fang

Particle Filtering (Not Required) • We can prove that if the samples given initially approximates , i.e., , then the new samples approximates , i.e., (see details in the textbook) • By induction, particle filtering is consistent: it provides the correct probabilities as • In practice, particle filtering works very well Fei Fang

Quiz 3 (Not Required) • Using particle sampling for the Umbrella example, if in one step, we get 100 samples with and total weight , and 400 samples with and total weight after propagating and weighting (before resampling), which of the following best estimates the number of samples with after resampling? • A: • B: • C: • D: • Each new sample is selected from . The probability of sampling is proportional to its weight • Sampled with replacement, i.e., one item can be sampled multiple times Fei Fang

Applications of DBN: Place and Object Recognition Which are hidden variables? Torralba, et al. ICCV, 2003. Context-based vision system for place and object recognition Fei Fang

Applications of DBN: Place and Object Recognition low-level features • Use scene context to disambiguate objective recognition • Inferring object types based on scene and object features • Context priming to decide which object detectors to run Torralba, et al. ICCV, 2003. Context-based vision system for place and object recognition Fei Fang

Applications of DBN: Infer and Predict Poaching Activity • Attacked • Not Attacked

Applications of DBN: Infer and Predict Poaching Activity • Domain knowledge • Poaching activity is impacted by ranger patrol effort, as well as features such as animal density • Detection probability is also impacted by ranger patrol effort and a subset of these features Area habitat Ranger patrol Probability of attack on target j Animal density Area slope Detection probability Distance to rivers / roads … Nguyen et al. Capture: A new predictive anti-poaching tool for wildlife protection. In AAMAS, 2016

Applications of DBN: Infer and Predict Poaching Activity • : Whether there is poaching • : Ranger patrol effort • : Whether poaching sign is found • : features, e.g., distance from road, animal density etc Nguyen et al. Capture: A new predictive anti-poaching tool for wildlife protection. In AAMAS, 2016 Fei Fang

Applications of DBN: Predict Urban Crime • Opportunistic criminals: Wander around and seek opportunities to commit crimes • : #defenders (known) • : #criminals (hidden) • : #crimes (known) Fei Fang

Summary • Applications of DBN • Place and Object Recognition • Infer and Predict Poaching Activity • Predict Urban Crime Temporal Models Dynamic Bayes’ Net (DBN) Particle Filtering Hidden Markov Models (HMM) Kalman Filter Viterbi Algorithm Fei Fang

Acknowledgment • Some slides are borrowed from previous slides made by Tai Sing Lee Fei Fang

Backup Slides Material in the backup slides in this lecture are not required Fei Fang

Viterbi Algorithm Bayes’ Rule: Product Rule: Sum Rule: If and , , then Fei Fang

15-381 / 681 Instructors: Fei Fang (This Lecture) and Dave Touretzky feifang@cmu