H idden M arkov M odels

Hidden Markov Models

Definition A doubly embedded stochastic process with an underlying stochastic process, that is not observable (it is hidden), but can only be observed through another set of stochastic processes that produce the sequence of observations

Interpretation A state machine where at each state one randomly chooses both the symbol to produce and the next state to move to

Elements • N - the number of states the set of states is denoted by • M - the number of distinct observation symbols, i.e. the discrete alphabet size the set of symbols is denoted by • The state transition probability distribution , where

Elements - continue • The observation symbol probability distribution in state j, ,where • The initial state distribution ,where • The final state distribution ,where

In this case a discrete and first order HMM was defined

Observation Sequence Generation • Choose an initial state according to the initial state distribution • Set t=1 • Choose according to the symbol probability distribution in state , i.e. • Transit to a new state according to the state transition probability distribution for state , i.e. • Repeat for t=t+1 until t=T

The 3 basic problems for HMMs • Problem 1: Given an observation sequence and a model what is the probability of the observation sequence given the model ?

The 3 basic problems for HMMs • Problem 2 (recognition):Given an observation sequence and a model what is the corresponding state sequence which best “explains” the observation sequence?

The 3 basic problems for HMMs • Problem 3 (training):how do we adjust the model parameters to maximize ?

Solution to Problem 1 Analysis The probability of the observation sequence O for a state sequence Q is assuming statistical independence of observations, we get

Solution to Problem 1 Analysis - continue The probability of such a state sequence Q is and the joint probability of O and Q is therefore the probability of O is a summery over all possible state sequences Q

Solution to Problem 1 Definitions The forward variable defined as is the probability of the partial observation sequence (until time t) and state at time t, given the model

Solution to Problem 1 The Forward-Backward Procedure is solved inductively as follows: • Initialization: • Induction: • Termination:

Solution to Problem 1 Illustration of the Forward Procedure

Solution to Problem 1 Lattice Illustration N STATE 2 1 1 2 3 T OBSERVATION, t

Solution to Problem 2 Definitions The quantity defined as is the best score (highest probability) along a single path, at time t, which accounts for the first t observations, and ends in state

Solution to Problem 2 The Viterbi Algorithm The best state sequence is found as follows: • Initialization: • Recursion:

Solution to Problem 2 The Viterbi Algorithm - continue • Termination: • Path (state sequence) backtracking:

Solution to Problem 2 Lattice Illustration N STATE 2 1 1 2 3 T OBSERVATION, t

Solution to Problem 3 Definitions The backward variable defined as is the probability of the partial observation sequence (from time t+1)and state at time t, given the model

Solution to Problem 3 Definitions - continue Let be defined as i.e. the probability of being in state at time t, given the observation sequence O and the model

Solution to Problem 3 Definitions - continue Let be defined as i.e. the probability of being in state at time t, and in state at time t+1, given the observation sequence O and the model

Solution to Problem 3 Variable Illustration

Solution to Problem 3 Analysis In terms of forward-backward variables

Solution to Problem 3 Analysis - continue In terms of forward-backward variables

Solution to Problem 3 Interpretations

Solution to Problem 3 The Forward-Backward Procedure is solved inductively as follows: • Initialization: • Induction:

Solution to Problem 3 Illustration of the Backward Procedure

Solution to Problem 3 The Baum-Welch Algorithm A set of reasonable reestimation formulas for , A and B are

Solution to Problem 3 Summary Let be the initial model and let be the reestimated model, then it has been proven that either • The initial model defines a critical point of the likelihood function • Model is more likely then model

Extensions • Continuous observation densities • Autoregressive observation sequences • Explicit state duration densities

Implementation issues • Multiple Observation Sequences • Insufficient training data

H idden M arkov M odels