Hidden Markov Models

Hidden Markov Models Fundamentals and applications to bioinformatics.

Markov Chains • Given a finite discrete set S of possible states, a Markov chain process occupies one of these states at each unit of time. • The process either stays in the same state or moves to some other state in S. • This occurs in a stochastic way, rather than in a deterministic one. • The process is memoryless and time homogeneous.

1 S1 S3 S2 1/3 2/3 1/3 1/2 1/6 Transition Matrix • Let S={S1, S2, S3}. A Markov Chain is described by a table of transition probabilities such as the following:

A simple example • Consider a 3-state Markov model of the weather. We assume that once a day the weather is observed as being one of the following: rainy or snowy, cloudy, sunny. • We postulate that on day t, weather is characterized by a single one of the three states above, and give ourselves a transition probability matrix A given by:

- 2 - • Given that the weather on day 1 is sunny, what is the probability that the weather for the next 7 days will be “sun-sun-rain-rain-sun-cloudy-sun”?

- 3 - • Given that the model is in a known state, what is the probability it stays in that state for exactly d days? • The answer is • Thus the expected number of consecutive days in the same state is • So the expected number of consecutive sunny days, according to the model is 5.

Elements of an HMM • What if each state does not correspond to an observable (physical) event? What if the observation is a probabilistic function of the state? An HMM is characterized by the following: • N, the number of states in the model. • M, the number of distinct observation symbols per state. • the state transition probability distribution where • the observation symbol probability distribution in state qj, , where bj(k) is the probability that the k-th observation symbol pops up at time t, given that the model is in state Ej. • the initial state distribution

Three Basic Problems for HMMs • Given the observation sequence O = O1O2O3…Ot, and a model m = (A, B, p), how do we efficiently compute P(O | m)? • Given the observation sequence O and a model m, how do we choose a corresponding state sequence Q = q1q2q3…qt which is optimal in some meaningful sense? • How do we adjust the model parameters to maximize P(O | m)?

Solution to Problem (1) • Given an observed output sequence O, we have that • This calculation involves the sum of NT multiplications, each being a multiplication of 2T terms. The total number of operations is on the order of 2T NT. • Fortunately, there is a much more efficient algorithm, called the forward algorithm.

The Forward Algorithm • It focuses on the calculation of the quantity which is the joint probability that the sequence of observations seen up to and including time t is O1,…,Ot, and that the state of the HMM at time t is Ei. Once these quantities are known,

…continuation • The calculation of the (t, i)’s is by induction on t. From the formula we get

Backward Algorithm • Another approach is the backward algorithm. • Specifically, we calculate (t, i) by the formula • Again, by induction one can find the (t,i)’s starting with the value t = T – 1, then for the value t = T – 2, and so on, eventually working back to t = 1.

Solution to Problem (2) • Given an observed sequence O = O1,…,OT of outputs, we want to compute efficiently a state sequence Q = q1,…,qT that has the highest conditional probability given O. • In other words, we want to find a Q that makes P[Q | O] maximal. • There may be many Q’s that make P[Q | O] maximal. We give an algorithm to find one of them.

The Viterbi Algorithm • It is divided in two steps. First it finds maxQ P[Q | O], and then it backtracks to find a Q that realizes this maximum. • First define, for arbitrary t and i, (t,i) to be the maximum probability of all ways to end in state Si at time t and have observed sequence O1O2…Ot. • Then maxQ P[Q and O] = maxi(T,i)

- 2 - • But • Since the denominator on the RHS does not depend on Q, we have • We calculate the (t,i)’s inductively.

- 3 - • Finally, we recover the qi’s as follows. Define and put: • This is the last state in the state sequence desired. The remaining qt for t < T are found recursively by defining and putting

Solution to Problem (3) • We are given a set of observed data from an HMM for which the topology is known. We wish to estimate the parameters in that HMM. • We briefly describe the intuition behind the Baum-Welch method of parameter estimation. • Assume that the alphabet M and the number of states N is fixed at the outset. • The data we use to estimate the parameters constitute a set of observed sequences {O(d)}.

The Baum-Welch Algorithm • We start by setting the parameters pi, aij, bi(k) at some initial values. • We then calculate, using these initial parameter values: • pi* = the expected proportion of times in state Si at the first time point, given {O(d)}.

- 2 - 2) 3) where Nij is the random number of times qt(d) =Si and qt+1(d) = Sj for some d and t; Ni is the random number of times qt(d) = Si for some d and t; and Ni(k) equals the random number of times qt(d) = Si and it emits symbol k, for some d and t.

Upshot • It can be shown that if  = (pi, ajk, bi(k)) is substituted by * = (pi*, ajk*, bi*(k)) then P[{O(d)}| *] P[{O(d)}| ], with equality holding if and only if * = . • Thus successive iterations continually increase the probability of the data, given the model. Iterations continue until a local maximum of the probability is reached.

Hidden Markov Models