EE3J2 Data Mining Lecture 15: Hidden Markov Models Martin Russell

EE3J2 Data MiningLecture 15: Hidden Markov ModelsMartin Russell EE3J2 Data Mining

Objectives • Hidden Markov models (HMMs) • Viterbi decoding • HMM training EE3J2 Data Mining

Dynamic Programming Distance Calculation Calculate ad(S,Q) for each sequence S in corpus Sequence retrieval using DP Corpus of sequential data ‘query’ sequence Q …… AAGDTDTDTDD AABBCBDAAAAAAA BABABABBCCDF GGGGDDGDGDGDGDTDTD DGDGDGDGD AABCDTAABCDTAABCDTAAB CDCDCDTGGG GGAACDTGGGGGAAA ……. ……. …BBCCDDDGDGDGDCDTCDTTDCCC… EE3J2 Data Mining

Limitations of ‘template matching’ • This type of analysis is sometimes referred to as template matching • The ‘templates’ are the sequences in the corpus • Can think of each template as representing a ‘class’ • Problem is to determine which class best fits the query • Performance will depend on precisely which template is used to represent the class EE3J2 Data Mining

Alternative path shapes • The basic units of path considered so far are: substitution insertion deletion • Others are possible and may have advantages, e.g: substitution insertion deletion EE3J2 Data Mining

Example EE3J2 Data Mining

Hidden Markov Models (HMMs) • One solution is to replace the individual template sequence with an ‘average’ sequence • But what is an ‘average sequence’? • One solution is to use a type of statistical model called a Hidden Markov Model EE3J2 Data Mining

Y B B C A B C A B X C A B C A Z A B C HMMs • Suppose the following sequences are in same class: • ABC, YBBC, ABXC, AZ • Compute alignments: EE3J2 Data Mining

Finite State Network Representation • The sequence consists of 3 ‘states’ • First state is ‘realised’ as A (twice) or Y (once) • Second state ‘realised’ as B (three times) or X (once) • Second state can be repeated or deleted • Third state can be ‘realised’ as C (twice) or Z (once) EE3J2 Data Mining

Network representation • Directed graph representation • Each state associated with a set of probabilities • Called the ‘state emission’ probabilities EE3J2 Data Mining

0.5 0.67 1 0.5 1 0.33 Hidden Markov Model (HMM) Basic rule for drawing transition networks: Connect state j to state k if ajk > 0 ajk=Prob(state k follows state j) EE3J2 Data Mining

Formal Definition • A Hidden Markov Model (HMM) for the symbols 1, 2, …, K consists of: • A number of states N • An N  N state transition probability matrix A • For each state k a set of probabilities pk(1), … , p(K) - p(k) is the probability that k occurs for state k EE3J2 Data Mining

Y A B B B X B C A B C Alignment paths for HMMs • For HMMs, alignment paths are called state sequences State sequence EE3J2 Data Mining

The optimal state sequence • Let M be a HMM and s a sequence • Probability on previous slide depends on the state sequence  and the model, so we write: • By analogy with dynamic programming, the optimal state sequence is the sequence such that: EE3J2 Data Mining

Y A B B B X B C A B C Rule: connect state j at symbol m with state k at symbol m+1 if ajk > 0 Computing the optimal state sequence:The ‘state-symbol’ trellis EE3J2 Data Mining

More examples EE3J2 Data Mining

Dynamic Programminga.k.a Viterbi Decoding Y A B B B X B C A B C K k 4 EE3J2 Data Mining

Viterbi Decoding Calculate p(Q|M) for each HMM M in corpus Sequence retrieval using HMMs Corpus of pre-build HMMs ‘query’ sequence Q …BBCCDDDGDGDGDCDTCDTTDCCC… EE3J2 Data Mining

EE3J2 Data Mining

HMM Construction • Suppose we have a set of HMMs, each representing a different class (e.g. protein sequence) • Given an unknown sequence s: • Use Viterbi decoding to compare s with each HMM • Compute • But how do we obtain the HMM in the first place? EE3J2 Data Mining

HMM training • Given a set of example sequences S a HMM M can be built such that p(S|M) is locally maximised • Procedure is as follows: • Obtain an initial estimate of a suitable model M0 • Apply an algorithm – the ‘Baum-Welch’ algorithm – to obtain a new model M1 such that p(S|M1) ≥ p(S|M0) • Repeat to produce a sequence of HMMs M0, M1,…,Mnwith: p(S|M0) ≤ p(S|M1) ≤ p(S|M2) ≤… ≤ p(S|Mn) EE3J2 Data Mining

Local optimality Local maximum P(S|M) Global maximum M0 M1…Mn EE3J2 Data Mining

Summary • Hidden Markov Models • Importance of HMMs for sequence matching • Viterbi decoding • HMM training EE3J2 Data Mining

EE3J2 Data Mining Lecture 15: Hidden Markov Models Martin Russell