330 likes | 346 Views
Learn about hidden Markov models, their definition, the three basic problems they solve, and the solutions to these problems. Explore implementation issues and extensions of HMMs.
E N D
Definition A doubly embedded stochastic process with an underlying stochastic process, that is not observable (it is hidden), but can only be observed through another set of stochastic processes that produce the sequence of observations
Interpretation A state machine where at each state one randomly chooses both the symbol to produce and the next state to move to
Elements • N - the number of states the set of states is denoted by • M - the number of distinct observation symbols, i.e. the discrete alphabet size the set of symbols is denoted by • The state transition probability distribution , where
Elements - continue • The observation symbol probability distribution in state j, ,where • The initial state distribution ,where • The final state distribution ,where
Observation Sequence Generation • Choose an initial state according to the initial state distribution • Set t=1 • Choose according to the symbol probability distribution in state , i.e. • Transit to a new state according to the state transition probability distribution for state , i.e. • Repeat for t=t+1 until t=T
The 3 basic problems for HMMs • Problem 1: Given an observation sequence and a model what is the probability of the observation sequence given the model ?
The 3 basic problems for HMMs • Problem 2 (recognition):Given an observation sequence and a model what is the corresponding state sequence which best “explains” the observation sequence?
The 3 basic problems for HMMs • Problem 3 (training):how do we adjust the model parameters to maximize ?
Solution to Problem 1 Analysis The probability of the observation sequence O for a state sequence Q is assuming statistical independence of observations, we get
Solution to Problem 1 Analysis - continue The probability of such a state sequence Q is and the joint probability of O and Q is therefore the probability of O is a summery over all possible state sequences Q
Solution to Problem 1 Definitions The forward variable defined as is the probability of the partial observation sequence (until time t) and state at time t, given the model
Solution to Problem 1 The Forward-Backward Procedure is solved inductively as follows: • Initialization: • Induction: • Termination:
Solution to Problem 1 Illustration of the Forward Procedure
Solution to Problem 1 Lattice Illustration N STATE 2 1 1 2 3 T OBSERVATION, t
Solution to Problem 2 Definitions The quantity defined as is the best score (highest probability) along a single path, at time t, which accounts for the first t observations, and ends in state
Solution to Problem 2 The Viterbi Algorithm The best state sequence is found as follows: • Initialization: • Recursion:
Solution to Problem 2 The Viterbi Algorithm - continue • Termination: • Path (state sequence) backtracking:
Solution to Problem 2 Lattice Illustration N STATE 2 1 1 2 3 T OBSERVATION, t
Solution to Problem 3 Definitions The backward variable defined as is the probability of the partial observation sequence (from time t+1)and state at time t, given the model
Solution to Problem 3 Definitions - continue Let be defined as i.e. the probability of being in state at time t, given the observation sequence O and the model
Solution to Problem 3 Definitions - continue Let be defined as i.e. the probability of being in state at time t, and in state at time t+1, given the observation sequence O and the model
Solution to Problem 3 Variable Illustration
Solution to Problem 3 Analysis In terms of forward-backward variables
Solution to Problem 3 Analysis - continue In terms of forward-backward variables
Solution to Problem 3 Interpretations
Solution to Problem 3 The Forward-Backward Procedure is solved inductively as follows: • Initialization: • Induction:
Solution to Problem 3 Illustration of the Backward Procedure
Solution to Problem 3 The Baum-Welch Algorithm A set of reasonable reestimation formulas for , A and B are
Solution to Problem 3 Summary Let be the initial model and let be the reestimated model, then it has been proven that either • The initial model defines a critical point of the likelihood function • Model is more likely then model
Extensions • Continuous observation densities • Autoregressive observation sequences • Explicit state duration densities
Implementation issues • Multiple Observation Sequences • Insufficient training data