Hidden Markov Models

Hidden Markov Models Pairwise Alignments

Hidden Markov Models • Finite state automata with multiple states as a convenient description of complex dynamic programming algorithms for pairwise alignment • Basis for a probabilistic modelling of the gapped alignment process by converting the FSA into HMM • Advantages: 1) use resulting probabilistic model to explore reliability of the alignment and explore alternative alignments 2) weighting all alternative alignments probabilistically yields scores of similarity independent of any specific alignment

X qxi ε δδ τ δ 1-ε -τ τ M pxiyj E B 1-2δ - τ 1-ε -τ τ δ δ Y qyj ε Hidden Markov Models

Hidden Markov Models • Pair Hidden Markov Models generate an aligned pair of sequences • Start in the Begin state B and cycle over the following two steps: 1) pick the next state according to the transition probability distributions leaving the current state 2) pick a symbol pair to be added to the alignment according to the emission probability distribution in the new state • Stop when a transition into the End state E is made

Hidden Markov Models • State M has emission probability distribution pab for emitting an aligned pair a:b • States X ynd Y have distributions qxi for emitting symbol xi from sequence x against a gap • The transition probability from M to an insert state X or Y is denoted δ and the probability of staying in an insert state by ε • The probability for transition into an end state is denoted τ • All algorithms discussed so far carry across to pair HMMs • The total probability of generating a particular alignment is just the product of the probabilities of each individual step.

Hidden Markov Models Viterbi Algorithm for pair HMMs • Initialisation: • Recurrence: • Termination:

1-η η η X qxi Y qyj η 1-η 1-η E B η 1-η Hidden Markov Models probabilistic model for a random alignment

Hidden Markov Model • The main states X and Y emit the two sequences independently • The silent state does not emit any symbol but gathers input from the X and Begin states • The probability of a pair of sequences according to the random model is

Hidden Markov Model • Allocate the terms in this expression to those that make up the probability of the Viterbi alignment, so that the log-odds ratio is the sum of the individual log-odds terms • Allocate one factor of (1-η) and the corresponding qa factor to each residue that is emitted in a Viterbi step • So the match transitions will be allocated (1-η)2qaqb where a and b are the residues matched • The insert states will be allocated (1-η)qa where a is the residue inserted • As the Viterbi path must account for all residues, exactly (n+m) terms will be used

Hidden Markov Model • We can now compute in terms of an additive model with log-odds emission scores and log-odds transition scores. • In practice this is the most practical way to implement pair HMMs • Merge the emission scores with the transitions to produce scores that correspond to the standard terms used in sequence alignment by dynamic programming • Now the log-odds version of the Viterbi alignment algorithm can be given in a form that looks like standard pairwise dynamic programming

Hidden Markov Models

Hidden Markov Model Optimal log-odds alignment • Initialisation: • Recursion: • Termination:

Hidden Markov Model • The constant c in the termination has the value • The procedure shows how for any pair HMM we can derive an equivalent finite state automaton for obtaining the most probable alignment

Hidden Markov Models