310 likes | 535 Views
Hidden Markov Models. Lirong Xia. Tue, March 28, 2014. The “ Markov”s we have learned so far. Markov decision process (MDP) transition probability only depends on ( state,action ) in the previous step Reinforcement learning unknown probability/rewards Markov models Hidden Markov models.
E N D
Hidden Markov Models Lirong Xia Tue, March 28, 2014
The “Markov”s we have learned so far • Markov decision process (MDP) • transition probability only depends on (state,action) in the previous step • Reinforcement learning • unknown probability/rewards • Markov models • Hidden Markov models
Markov Models • A Markov model is a chain-structured BN • Conditional probabilities are the same (stationarity) • Value of X at a given time is called the state • As a BN: • Parameters: called transition probabilities p(X1) p(X|X-1)
Computing the stationary distribution • p(X=sun)=p(X=sun|X-1=sun)p(X=sun)+ p(X=sun|X-1=rain)p(X=rain) • p(X=rain)=p(X=rain|X-1=sun)p(X=sun)+ p(X=rain|X-1=rain)p(X=rain)
Hidden Markov Models • Hidden Markov models (HMMs) • Underlying Markov chain over state X • Effects(observations) at each time step • As a Bayes’ net:
Example • An HMM is defined by: • Initial distribution: p(X1) • Transitions: p(X|X-1) • Emissions: p(E|X)
Filtering / Monitoring • Filtering, or monitoring, is the task of tracking the distribution B(X) (the belief state) over time • B(Xt) = p(Xt|e1:t) • We start with B(X) in an initial setting, usually uniform • As time passes, or we get observations, we update B(X)
Example: Robot Localization Sensor model: never more than 1 mistake Motion model: may not execute action with small prob.
HMM weather example: a question .6 p(w|s) = .1 p(w|c) = .3 p(w|r) = .8 s .1 .3 .4 .3 .2 c r .3 .3 .5 • You have been stuck in the lab for three days (!) • On those days, your labmate was dry, wet, wet, respectively • What is the probability that it is now raining outside? • p(X3 = r | E1= d, E2 = w, E3 = w)
Filtering .6 p(w|s) = .1 p(w|c) = .3 p(w|r) = .8 s .1 .3 .4 .3 .2 c r .3 .3 .5 • Computationally efficient approach: first compute • p(X1 = i, E1 = d) for all states i • p(Xt, e1:t) = p(et | Xt)Σxt-1 p(xt-1, e1:t-1) p(Xt| xt-1)
Today • Formal algorithm for filtering • Elapse of time • compute p(Xt+1|Xt,e1:t) from p(Xt|e1:t) • Observe • compute p(Xt+1|e1:t+1) from p(Xt+1|e1:t) • Renormalization • Introduction to sampling
Elapse of Time • Assume we have current belief p(Xt-1|evidence to t-1) B(Xt-1)=p(Xt-1|e1:t-1) • Then, after one time step passes: p(Xt|e1:t-1)=Σxt-1p(Xt|xt-1)p(Xt-1|e1:t-1) • Or, compactly B’(Xt)=Σxt-1p(Xt|xt-1)B(xt-1) • With the “B” notation, be careful about • what time step t the belief is about, • what evidence it includes
Observe and renormalization • Assume we have current belief p(Xt| previous evidence): B’(Xt)=p(Xt|e1:t-1) • Then: p(Xt|e1:t)∝p(et|Xt)p(Xt|e1:t-1) • Or: B(Xt) ∝p(et|Xt)B’(Xt) • Basic idea: beliefs reweighted by likelihood of evidence • Need to renormalize B(Xt)
Recap: The Forward Algorithm • We are given evidence at each time and want to know • We can derive the following updates We can normalize as we go if we want to have p(x|e) at each time step, or just once at the end…
Observe and time elapse • Want to know B(Rain2)=p(Rain2|+u1,+u2) Time elapse and renormalize Observe
Online Belief Updates • Each time step, we start with p(Xt-1 | previous evidence): • Elapse of time B’(Xt)=Σxt-1p(Xt|xt-1)B(xt-1) • Observe B(Xt) ∝p(et|Xt)B’(Xt) • Renormalize B(Xt) • Problem: space is |X| and time is |X|2 per time step • what if the state is continuous?
Continuous probability space • Real-world robot localization
Approximate Inference • Sampling is a hot topic in machine learning, and it’s really simple • Basic idea: • Draw N samples from a sampling distribution S • Compute an approximate posterior probability • Show this converges to the true probability P • Why sample? • Learning: get samples from a distribution you don’t know • Inference: getting a sample is faster than computing the right answer (e.g. with variable elimination)
Prior Sampling Samples: +c, -s, +r, +w -c, +s, -r, +w
Prior Sampling (w/o evidences) • This process generates samples with probability: i.e. the BN’s joint probability • Let the number of samples of an event be • Then • I.e., the sampling procedure is consistent
Example • We’ll get a bunch of samples from the BN: +c, -s, +r, +w +c, +s, +r, +w -c, +s, +r, -w +c, -s, +r, +w -c, -s, -r, +w • If we want to p(W) • We have counts <+w:4, -w:1> • Normalize to get p(W) = <+w:0.8, -w:0.2> • This will get closer to the true distribution with more samples • Can estimate anything else, too • What about p(C|+w)? p(C|+r,+w)? p(C|-r,-w)? • Fast: can use fewer samples if less time (what’s the drawback?)
Rejection Sampling • Let’s say we want p(C) • No point keeping all samples around • Just tally counts of C as we go • Let’s say we want p(C|+s) • Same thing: tally C outcomes, but ignore (reject) samples which don’t have S=+s • This is called rejection sampling • It is also consistent for conditional probabilities (i.e., correct in the limit) +c, -s, +r, +w +c, +s, +r, +w -c, +s, +r, -w +c, -s, +r, +w -c, -s, -r, +w
Likelihood Weighting • Problem with rejection sampling: • If evidence is unlikely, you reject a lot of samples • You don’t exploit your evidence as you sample • Consider p(B|+a) • Idea: fix evidence variables and sample the rest • Problem: sample distribution not consistent! • Solution: weight by probability of evidence given parents -b, -a -b, -a -b, -a -b, -a +b, +a -b, +a -b, +a -b, +a -b, +a +b, +a
Likelihood Weighting Samples: +c, +s, +r, +w ……
Likelihood Weighting • Sampling distribution if z sampled and e fixed evidence • now, samples have weights • Together, weighted sampling distribution is consistent
Ghostbusters HMM • p(X1) = uniform • p(X|X’) = usually move clockwise, but sometimes move in a random direction or stay in place • p(Rij|X) = same sensor model as before: red means close, green means far away. p(X1) p(X|X’=<1,2>)
Example: Passage of Time • As time passes, uncertainty “accumulates” T = 1 T = 2 T= 5 Transition model: ghosts usually go clockwise
Example: Observation • As we get observations, beliefs get reweighted, uncertainty “decreases” Before observation After observation