310 likes | 573 Views
A Tutorial of Hidden Markov Models. Shuxing Cheng CS673 Project. Our talk is based on the following paper A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. By Lawrence R. Rabiner. Outline. Markov model. Hidden Markov model(HMM).
E N D
A Tutorial of Hidden Markov Models Shuxing Cheng CS673 Project
Our talk is based on the following paper A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. By Lawrence R. Rabiner
Outline • Markov model. • Hidden Markov model(HMM). • Three basic problems as well as their solutions. • More HMM types. • The application of HMM.
Background of Markov models Markov model: A stochastic processes holding the Markov property Markov property(memoryless property): • Future is independent of past given present • Three basic information to define a Markov model: • parameter space. • state space. • state transition probability. One step transition probability is the basis.
A Markov model example 0.5 1 A B C 0.5 1 1 1 D E F 0.3 1 1 0.4 H I G 1 0.3 graphical representation matrix representation
If the states are not observable! • States are not observable. • Observations are probabilistic functions of states. • State transitions are still probabilistic. Using Hidden Markov Models(HMMs)!
The urn and ball problem • Each state corresponds to a specific urn. • A (ball) color probability is defined for each state. • The choice of urns is dictated by the state transition matrix.
MMs vs. HMMs • MMs and HMMs represent different levels of knowledge about “real” state. • MMs use knowledge of the state history and the current state to predict future states. • HMMs use evidence of historical states and evidence of the current state to predict future states.
Elements of HMM • N: The number of states. • A: The transition probability matrix between each state. • M: The number of distinct observation symbols per state, we denote the individual symbols as • The observation symbol probability distribution in state j, where • The initial state distribution Two model parameters: N, M Three probability measures: A, B,
Build the observation sequence Given the above five elements, we can build an observation sequence: T is the number of observations. This observation sequence is build as following. Set t=1. Choose an initial state according to the initial distribution. Choose according to the symbol probability distribution. Transit to a new state according to the state transition probability distribution. Set t=t+1, return to line 3 while t<=T.
Three basic problems of HMMs • Given the observation sequence and a model, how do we efficiently compute , the probability of this observation sequence for the given model. • Given the observation sequence, and the model, how do we choose a corresponding state sequence . • How do we adjust the model parameter to maximize ?
The discussion for these three problems Problem1: How well a given model matches a given observation sequence. Problem2: To uncover the hidden part of the model. Problem3:Try to optimize the model parameter; in other words, try to train the HMM.
Solution to the first problem A straightforward way: We assume the statistical independence of observation here. NT possible state sequence. (2T-1) calculations per state sequence.
Forward procedure Define the forward variable We can solve it inductively: It requires on the order of N2T calculations.
Backward variable Similar to the forward variable, we can define the backward variable It can be solved inductively:
Solution to the second problem Several possible solutions exist because of the different optimality criterion. One of the most popular one is Viterbi algorithm: Find the single best state sequence. Formally: with the given observation sequence define the quantity: is called the best score along a single path at time t. By induction, we have
The complete procedure Initialization Recursion
The complete procedure Termination Path backtracking
Solution to the third problem • Given any finite observation sequences as training data, there is no optimal way of estimating the model parameters. • Locally maximized approach include: • Baum-Welch method(EM, expectation-modification method). • Gradient techniques.
Baum-Welch method Define
Baum-Welch method (continue) : the probability of being in state Si at time t. : expected number of transitions from Si. : expected number of transitions from Si to Sj. Using the above three formulas, we can give a method for re-estimation of the parameters of an HMM.
Baum-Welch method (continue) A set of reasonable re-estimation of formulas for : expected number of times in state Si at time (t=1)
Baum-Welch method (continue) Given the current model of The re-estimated model is It can be proved that either The initial model defines a critical point, in which or Model is more likely then model in the sense that
Baum-Welch method (continue) The above updating scheme can be derived by maximizing Baum’s auxiliary function over , they prove that maximization of leads to increased likelihood
More types of HMMs There exist a lot of different types of HMMs. We introduce one variation based on the Markov model itself. ergodic Markov Model reducible Markov Model
Distance between HMMs Given two HMMs, λ1 and λ2, what is a reasonable measure of the similarity of the two models? We can use this kind of measure: What is the problem of this measure?
Distance between HMMs Given two HMMs, λ1 and λ2, what is a reasonable measure of the similarity of the two models? We can use this kind of measure: What is the problem of this measure? It is nonsymmetric! We introduce this following measure.
Application of HMMs • Speech recognition. • Bioinformatics.
Isolated word recognition • For each word v, we build an HMM λv. • For each unknown word which is to be recognized, we have to compute P(O| λv). • Select the word whose model likelihood is highest.