Hidden Markov Models

Hidden Markov Models So far: considered systems for making a single decision (e.g. discriminant functions or estimation of class-conditional densities.) Now we consider: the problem of sequential decision making Example: Automatic Speech Recognition (ASR). In ASR, we need to determine a sequence of phonemes (like vowels and consonants) that make up the observed speech sound. For this we will introduce Hidden Markov Models (HMMs): P01760 Advanced Concepts in Signal Processing

First-order Markov Models NOTE: First-order depends only on previous state P01760 Advanced Concepts in Signal Processing

Markov Model State Transition Graph P01760 Advanced Concepts in Signal Processing

Calculating the model probability P01760 Advanced Concepts in Signal Processing

Calculating (cont) P01760 Advanced Concepts in Signal Processing

Basic Markov Model: Example P01760 Advanced Concepts in Signal Processing

Markov: Example 2 P01760 Advanced Concepts in Signal Processing

Hidden Markov Model P01760 Advanced Concepts in Signal Processing

Hidden Markov model This model shows all state transitions as being possible: not always the case. P01760 Advanced Concepts in Signal Processing

Left-to-Right Models P01760 Advanced Concepts in Signal Processing

Probability Parameters P01760 Advanced Concepts in Signal Processing

3 central issues P01760 Advanced Concepts in Signal Processing

Evaluation P01760 Advanced Concepts in Signal Processing

Evaluation (cont) P01760 Advanced Concepts in Signal Processing

Recursive calculation of P(VT) Let us write P(VT) as: However we don’t have to do the calculation in this order! Re-ordering we get: P01760 Advanced Concepts in Signal Processing

time time time time time Recursive calculation of P(VT) Graphically we can illustrate this as follows (N.B. this is not a state transition diagram). We observe {v(1),v(2),v(3),…}. P01760 Advanced Concepts in Signal Processing

HMM Forward Algorithm P01760 Advanced Concepts in Signal Processing

HMM Forward Algorithm (cont) P01760 Advanced Concepts in Signal Processing

Forward Algorithm Step P01760 Advanced Concepts in Signal Processing

Evaluation Example P01760 Advanced Concepts in Signal Processing

0 0 0 0.0011 0 0.0024 0.0052 0.09 0 1 0.0077 0.01 0 0 0.0002 0.2 0.0057 0 0.0007 0 Evaluation Example (cont) v1 v3 v2 v0 w0 0.2x0 Initial state 0.3x0.3 w1 0.1x0.1 w2 0.4x0.5 w3 t=0 1 2 3 4 P01760 Advanced Concepts in Signal Processing

Making Decisions Given the ability to calculate the probability of an observed sequence. We can now compare different HMMs. This is just Bayesian Decision theory revisited! Recall: Hence given model θ1 and θ2 we select θ1 if: Example: suppose θ1 = ‘y’-’e’-’s’ and θ2 = ‘n’-’o’. If we expect that the answer is more likely to be ‘yes’ we weight the priors accordingly. P01760 Advanced Concepts in Signal Processing

An Alternative Recursion Alternatively given: We can re-ordering as: P01760 Advanced Concepts in Signal Processing

The Backward Algorithm P01760 Advanced Concepts in Signal Processing

Backward Algorithm P01760 Advanced Concepts in Signal Processing

Decoding Problem The problem is to choose the most likely state sequence, ωT, for a given observation sequence VT. Unlike the evaluation problem, this one is not uniquely defined. For example at time t we could find: However this only finds the states that are individually most likely – hence the sequence, ωT, may not be viable. P01760 Advanced Concepts in Signal Processing

Viterbi Algorithm P01760 Advanced Concepts in Signal Processing

w0 w1 w2 w3 t=0 1 2 3 4 Viterbi: is this possible? Optimal sequence for t = 1,2,3,4 Optimal sequence for t = 1,2,3 If not – why not? P01760 Advanced Concepts in Signal Processing

Viterbi Algorithm P01760 Advanced Concepts in Signal Processing

0 0 0.0004032 0 0 1 0 0.09 0.0027 0.00126 0.0063 0.01 0.000126 0 0 0 0.000504 0 0.0036 0.2 Decoding Example v1 v3 v2 v0 w0 0.2x0 0x0 Initial state 0.3x0.3 0.09x0.3 w1 0.1x0.1 0.01x0.5 0.2x0.1 w2 0.4x0.5 w3 t=0 1 2 3 4 P01760 Advanced Concepts in Signal Processing

The Learning Problem (Briefly) The 3rd problem is the most difficult. Aim: to learn the parameters, aij and bjkfrom a set of training data. Obvious approach: Maximum Likelihood Learning However we have a familiar problem: That is: we must marginalize out the state sequences, ωT. P01760 Advanced Concepts in Signal Processing

The Learning Problem (cont.) Solution is similar to learning prior probability weights in MoGs (i.e. using EM a.k.a. Baum-Welch/Forward-Backward) we iteratively estimate the transition probabilities, and the emission probabilities, The key ingredient is the following quantity: i.e. it can be calculated from the Forward and Backward steps and the current estimates for and P01760 Advanced Concepts in Signal Processing

Expected number of transitions from i→j Expected number of occurrences of state j emitting vk Expected number of transitions from i→anywhere The Learning Problem Updating requires the estimated prob. of moving from state i to state j, hence: Updating requires the estimated prob. of emitting visible symbol vk when in state j, hence: Expected number of occurrences of state j P01760 Advanced Concepts in Signal Processing

HMMs for speech recognition • In ASR the observed data is usually a measure of the short term spectral properties of the speech. There are two popular approaches: • Continuous Density observations – The finite states ω(t) are mapped into a continuous feature space using a MoG density model. • VQ observations – the continuous feature space is discretized into a finite symbol set using vector quantization. HMM for word 1 HMM for word 2 LPC feature analysis & Vector Quantization speech signal Select max. output word . . . HMM for word N An example of an isolated word HMM recognition system: P01760 Advanced Concepts in Signal Processing

Hidden Markov Models