590 likes | 832 Views
Probabilistic Reasoning Over Time ( HMM and Kalman filter ). Based on CS570 Class Note of Year 2004, Prof. Bongki Sin ’ s tutorial (Year 2003), and Dr. Sung Jung Cho ’ s tutorial (Year 2005). Contents. Markov Models Hidden Markov Models HMMs as Generative Processes
E N D
Probabilistic Reasoning Over Time(HMM and Kalman filter ) Based on CS570 Class Note of Year 2004, Prof. Bongki Sin’s tutorial (Year 2003), and Dr. Sung Jung Cho’s tutorial (Year 2005)
Contents • Markov Models • Hidden Markov Models • HMMs as Generative Processes • Markov Assumptions for HMMs • The 3 Problems of HMMs • HMMs for Speech Recognition • Kalman filters
Markov Process • Stochastic process of a temporal sequence • Probability distribution of the variable q at time t depends on the variable q at times t-1 to 1 • First order Markov process • State transition from state depends on previous state: • P[qt=j|qt-1=i, qt-2=k,…] = P[qt=j|qt-1=i] • State transition is independent of time: • aij = P[qt=j|qt-1=i]
Markov Models • Markov Model • Model of a Markov process with discrete state • Given the observed sequence, state sequence uniquely defined. • Probability of the state sequence 's1 s3 s1 s2 s2 s3' given the observation sequence 'A C A B B C' is 1
Markov Models (Graphical View) • A Markov model: • A Markov model unfolded in time:
sunny Weather of Tomorrow 0.3 0.1 sunny cloudy rain 0.8 sunny 0.2 0.1 0.8 0.1 0.1 Weather Of Today cloudy rain cloudy 0.2 0.6 0.2 0.3 0.6 0.4 rain 0.3 0.3 0.4 0.2 Example of Markov Model • Markov chain with 3 states • 3 states : sunny, cloudy, rain
rain sunny sunny sunny cloudy sunny 0.1 0.8 0.2 0.8 P(sunny,cloudy,rain) = P(sunny)P(cloudy|sunny)P(rain|cloudy) = 1.0 x 0.1 x 0.2 = 0.02 P(sunny,sunny,sunny) = P(sunny)P(sunny|sunny)P(sunny|cloudy) = 1.0 x 0.8 x 0.8 = 0.64 Example of Markov Model (cont’) • Probability of a sequence S • Compute product of successive probabilities • Ex. How is weather for next 2 days (today : sunny)? • Possible answer : sunny-sunny with 64%
Sequential Data • Often highly variable, but has an embedded structure • Information is contained in the structure
More examples • Text, on-line handwiritng, music notes, DNA sequence, program codes main() { char q=34, n=10, *a=“main() { char q=34, n=10, *a=%c%s%c; printf( a,q,a,q,n);}%c”; printf(a,q,a,n); }
Why HMM? • Because the HMM is a very good model for such patterns! • highly variable spatiotemporal data sequence • often unclear, uncertain, and incomplete • Because it is very successful in many applications! • Because it is quite easy to use! • Tools already exist…
Time Series Example • Representation • X = x1 x2 x3 x4 x5 … xT-1 xT = s p iy iy iy ch ch ch ch
Hidden Markov Model • Hidden Markov Model • State is not observed (hidden) • Observable symptom (output) • Transition probabilities between states • Depend only on previous state: • Emission probabilties • Depend only on the current state:(where xt is observed)
Markov Assumptions • Emissions • Probability to emit xt at time t in state qt = i does not depend on anything else: • Transitions • Probability to go from state j to state i at time t does not depend on anything else • Probability does not depend on time t:
Hidden Markov Model Hidden Markov model unfolded in time Hidden Markov Models (Graphical View)
HMM as Generative Processes • HMM can be used to generate sequences • Define a set of starting states with initial probabilities P(q0 = i) • Define a set of final states • For each sequence to generate: • Select an initial state j according to P(q0) • Select the next statei according to P(qt = i|qt-1=j) • Emit an output according to the emission distribution P(xt|qt = i) • If i is a final state, then stop, otherwise loop to step 2
given hidden Coin Toss Model • 2-Coins model • Description • State S={S1, S2} : two differentbiased coins • Each state characterized by probability distribution of heads and tails • States transitions characterized by state transition matrix • Observation symbol V={H, T} (H: Head, T: Tail)
Urn and Ball Model • Each urn contain colored balls (4 distinct colors) • Basic step • Choose urn according to some probabilistic procedure • Get a ball from the urn • Record (observe) its color • Replace the ball • Repeat the above procedure. • Colors of selected balls are observed but sequence of choosing urns is hidden
Markov chain process Output process Summary of the Concept
HMM Characterization • (A, B, ) • A : state transition probability { aij | aij = p(qt+1=j|qt=i) } • B : symbol output/observation probability { bj(v) | bj(v) = p(x=v|qt=j) } • : initial state distribution probability { i | i = p(q1=i) }
= [ 1.0 0 0 0 ] 0.6 0.5 0.7 1 2 3 4 0.6 0.4 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.7 0.3 0.0 0.0 0.0 1.0 1 2 3 4 0.4 1 2 3 4 A = s p iy ch iy p ch ch iy p s 0.2 0.2 0.0 0.6 … 0.0 0.2 0.5 0.3 … 0.0 0.8 0.1 0.1 … 0.6 0.0 0.2 0.2 … 1 2 3 4 B = Graphical Example
The 3 Problems of HMMs • HMM model gives rise to 3 different problems: • The Evaluation Problem • Given HMM parameterized by , compute likelihood of a sequence • The Decoding Problem • Given HMM parameterized by , compute optimal path Q through the state space given a sequence X: • The Learning Problem • Given an HMM parameterized by and a set of sequences Xn, select parameters such that:
The Evaluation ProblemFinding Probability of Observation • Sphinx quiz • Sphinx in castle. • The sphinx proposes a quiz. • The sphinx unseen to you shows a card from 4 kinds (spade, heart, diamond, clover) every day. • It depends on her feeling at the day which card is chosen. • The feeling change pattern and preference for each feeling are known. • After 3 cards are shown, you must answer probability of the observation sequence
The Evaluation ProblemStraightforward way • Straightforward way • Enumerating every possible state sequence of length T(the number of observation) • P( ) = P( ) + P( ) + … + P( ) • Time complexity : 2 * T * NT • Time complexity is too high • Consider • Use probability of partial observation
The Evaluation ProblemForward Variable Approach • Forward variable • Save probability of partial observation sequence in state matrix. • Forward variable in Sj • Use “Forward Variable” in previous states • Calculate each transition probability with forward variable and emittion probability. • Sum all calculations.
The Evaluation ProblemForward Variable Approach • Forward variable • Probability of having generated sequence and being in state i at time t
Sum αT(i)’s to get P(O|λ) The Evaluation ProblemForward Variable Approach • Reminder: • Initial condition: • -> prior probabilities of each state i • Compute for each state i and each time t of a given sequence • Compute likelihood as follows:
The Evaluation ProblemForward Variable Approach • Let’s Do it. • Assume prior probability P( )=P( )=.5 • ( ,1) = P( ) * P( | ) = .5 * .2 • ( ,1) = P( ) * P( | ) = .5 * .1 • ( ,2) = ( ,1)* P( | )*P( | ) + ( ,1) * P( | ) * P( | ) • …
The Decoding ProblemFinding Best State Sequence • Sphinx quiz • The sphinx changes a quiz. • Same condition as before • After 3 cards are shown, you must find the sequence of her feelings (maximum likely state sequence) • Answer is : ? ? …
… individually chosen … individually chosen zero prob. transition … … The Decoding ProblemChoosing Individually most likely states • Find individually most likely state • Find most likely first state, • Find most likely second state, and so on • In Quiz • We get • Problem • No guarantee that path is valid one when HMM has state transition with zero probability
The Decoding ProblemViterbi algorithm • Find single best state sequence path • Maximize P(Q|X, ), i.e. maximize P(Q,X| ) • Based on dynamic programming methods • Dynamic programming • Similar to shortest path algorithm • Use “Viterbi Variable” in previous states • Have maximum probability of partial sequence • Have sequence of its states • Calculate each transition probability with Viterbi variable and emittion probability. • Choose state in previous states, which has maximum result
The Decoding ProblemViterbi algorithm • Viterbi algorithm finds the best state sequence • Viterbi variable:
The Decoding ProblemViterbi algorithm • step 1 : Initialization • 1( i ) = ibi (O1) for 1≤i≤N, ( is initial prob, b is output prob.) • 1( i ) = 0 (sequnce of best path) • step 2 : Induction • t( j ) = Max[ t-1( i ) aij ]bj (Ot), 1≤j≤N • t( j ) = argmax[ t-1( i ) aij ], 1≤j≤N (store backtrace) • step 3 : Termination • P* = Max[T (s)] • qT *= argmax[T(s)] • step 4 : Path (state sequence) backtracking (t=T-1..1) • qt *= t+1(qt+1 *) 1 states 2 3
The Decoding ProblemViterbi algorithm • Let’s Do it • Step 1: Initialization • 1( ) = P( ) * P( | ) = .5 * .2 = .1 • 1( ) = P( ) * P( | ) = .5 * .1 = .05 • Step 2: Induction • 1( ) * P( | ) * P ( | ) = .1 * .8 * .6 = 0.048 • 1( ) * P( | ) * P ( | ) = .05 * .6 * .6 = 0.018 • 2( ) = 0.048 • …
The Learning Problem Parameter Estimation Problems • Sphinx quiz • Sphinx changes a quiz again!! • No information about “condition of feeling changes” and “choosing card” • With many card sequences you have to find best model which give best conditions of “feeling changes” and “choosing card”
The Learning ProblemBaum-Welch Method • Find • : model parameter • Locally maximize it by iterative hill-climbing algorithm • Work out probability of observations using some model. • Find which state transitions, symbol emissions used most. • By increasing probability of those, choose revised model which gives higher probability to observations • Training !
The Learning ProblemBaum-Welch Method • Baum-Welch Method Algorithms • Step 1 : Begin with some model (perhaps pre-selected or just chosen randomly) • Step 2 : Run O through current model to estimate expectations of each model parameter • Step 3 : Change model to maximize values of paths used a lot • Step 4 : Repeat this process, until converging on optimal values for the model parameter
The Learning ProblemBaum-Welch Method • Let’s Do it • Step 1: Choose initial model • Step 2: Run O through current model to estimate expectations of each model parameter • Step 3 : Change model to maximize values of paths used a lot • Step 4 : Repeat this process, until converging on optimal values for the model parameter
HMMs for Speech Recognition • Find a sequence of phonemes (or words) given an acoustic sequence • ex. “How to wreak a nice beach.” • ex. “How to recognize speech.” • Idea: use a phoneme model
Phoneme model • Phoneme • Smallest unit of sound • Distinct meaning • Consonant • Vowel • Phoneme model • Observed speech signals • Find sequence of states • Maximize P(signals|states)
Embbeded Training of HMMs • For each acoustic sequence in training set, create new HMM as concatenation of the HMMs representing underlying sequence of phonemes. • Maximize likelihood of training sentences.
HMMs: Decoding a Sentence • Decide what is accepted vocabulary • Optionally add a language model: P(word sequence) • Efficient algorithm to find optimal path in decoding HMM:
A demo of HMM application • http://www.mmk.e-technik.tu-muenchen.de/rotdemo.html • This demo shows the image retrieval system, which enables the user to search a grayscale image database intuitively by presenting simple sketches. • You can find the detailed description of this demo at: • http://www.mmk.e-technik.tu-muenchen.de/demo/imagedb/theory.html
Kalman Filter? • What is the Kalman Filter? • A technique that can be used to recursively estimate unobservable quantities called state variables, {xt}, from an observed time series {yt}. • What is it used for? • Tracking missiles • Extracting lip motion from video • Lots of computer vision applications • Economics • Navigation
Estimating the location of a ship Problem? “Suppose that you are lost at sea during the night and have no idea at all of your location.” Problem? Inherent measuring device inaccuracies. Your measurement has somewhat uncertainty!
Uncertainty • Conditional density of position based on measured value z1 • Assume Gaussian distribution z1 : Measured position x : Real position Q: What can be a measure of uncertainty?