1 / 23

Variants of HMMs

Variants of HMMs. Higher-order HMMs. How do we model “memory” larger than one time point? P(  i+1 = l |  i = k) a kl P(  i+1 = l |  i = k,  i -1 = j) a jkl … A second order HMM with K states is equivalent to a first order HMM with K 2 states. a HHT. state HH. state HT.

drew
Download Presentation

Variants of HMMs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Variants of HMMs

  2. Higher-order HMMs • How do we model “memory” larger than one time point? • P(i+1 = l | i = k) akl • P(i+1 = l | i = k, i -1 = j) ajkl • … • A second order HMM with K states is equivalent to a first order HMM with K2 states aHHT state HH state HT aHT(prev = H) aHT(prev = T) aHTH state H state T aHTT aTHH aTHT state TH state TT aTH(prev = H) aTH(prev = T) aTTH

  3. Modeling the Duration of States 1-p Length distribution of region X: E[lX] = 1/(1-p) • Geometric distribution, with mean 1/(1-p) This is a significant disadvantage of HMMs Several solutions exist for modeling different length distributions X Y p q 1-q

  4. Sol’n 1: Chain several states p 1-p X Y X X q 1-q Disadvantage: Still very inflexible lX = C + geometric with mean 1/(1-p)

  5. Sol’n 2: Negative binomial distribution Duration in X: m turns, where • During first m – 1 turns, exactly n – 1 arrows to next state are followed • During mth turn, an arrow to next state is followed m – 1 m – 1 P(lX = m) = n – 1 (1 – p)n-1+1p(m-1)-(n-1) = n – 1 (1 – p)npm-n p p p 1 – p 1 – p 1 – p Y X X X ……

  6. Example: genes in prokaryotes • EasyGene: Prokaryotic gene-finder Larsen TS, Krogh A • Negative binomial with n = 3

  7. Solution 3: Duration modeling Upon entering a state: • Choose duration d, according to probability distribution • Generate d letters according to emission probs • Take a transition to next state according to transition probs Disadvantage: Increase in complexity: Time: O(D2) -- Why? Space: O(D) where D = maximum duration of state X

  8. Connection Between Alignment and HMMs

  9. A state model for alignment M (+1,+1) Alignments correspond 1-to-1 with sequences of states M, I, J I (+1, 0) J (0, +1) -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC-GGTCGATTTGCCCGACC IMMJMMMMMMMJJMMMMMMJMMMMMMMIIMMMMMIII

  10. Let’s score the transitions s(xi, yj) M (+1,+1) Alignments correspond 1-to-1 with sequences of states M, I, J s(xi, yj) s(xi, yj) -d -d I (+1, 0) J (0, +1) -e -e -e -e -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC-GGTCGATTTGCCCGACC IMMJMMMMMMMJJMMMMMMJMMMMMMMIIMMMMMIII

  11. How do we find optimal alignment according to this model? Dynamic Programming: M(i, j): Optimal alignment of x1…xi to y1…yjending in M I(i, j): Optimal alignment of x1…xi to y1…yj ending in I J(i, j): Optimal alignment of x1…xi to y1…yjending in J The score is additive, therefore we can apply DP recurrence formulas

  12. Needleman Wunsch with affine gaps – state version Initialization: M(0,0) = 0; M(i,0) = M(0,j) = -, for i, j > 0 I(i,0) = d + ie; J(0,j) = d + je Iteration: M(i – 1, j – 1) M(i, j) = s(xi, yj) + max I(i – 1, j – 1) J(i – 1, j – 1) e + I(i – 1, j) I(i, j) = max e + J(i, j – 1) d + M(i – 1, j – 1) e + I(i – 1, j) J(i, j) = max e + J(i, j – 1) d + M(i – 1, j – 1) Termination: Optimal alignment given by max { M(m, n), I(m, n), J(m, n) }

  13. Probabilistic interpretation of an alignment An alignment is a hypothesis that the two sequences are related by evolution Goal: Produce the most likely alignment Assert the likelihood that the sequences are indeed related

  14. 1 – 2 –  M P(xi, yj) 1 – 2 –  1 – 2 –      I P(xi) J P(yj)   A Pair HMM for alignments BEGIN  1 – 2 –    I M J    END

  15. A Pair HMM for unaligned sequences Model R 1 -  1 -  P(x, y | R) = (1 – )m P(x1)…P(xm) (1 – )n P(y1)…P(yn) = 2(1 – )m+n i P(xi) j P(yj) J P(yj) I P(xi) END BEGIN END BEGIN 1 -  1 -   

  16. To compare ALIGNMENT vs. RANDOM hypothesis 1 – 2 –  Every pair of letters contributes: • (1 – 2 – ) P(xi, yj) when matched •  P(xi) P(yj) when gapped • (1 – )2P(xi) P(yj) in random model Focus on comparison of P(xi, yj) vs. P(xi) P(yj) M P(xi, yj) 1 – 2 –  1 – 2 –      I P(xi) J P(yj)   1 -  1 -  J P(yj) I P(xi) END BEGIN END BEGIN 1 -  1 -   

  17. To compare ALIGNMENT vs. RANDOM hypothesis Idea: We will divide alignment score by the random score, and take logarithms Let P(xi, yj) (1 – 2 – ) s(xi, yj) = log ––––––––––– + log ––––––––––– P(xi) P(yj) (1 – )2 (1 – 2 – ) P(xi) d = – log –––––––––––––––––––– (1 – ) (1 – 2 – ) P(xi)  P(xi) e = – log ––––––––––– (1 – ) P(xi) Every letter b in random model contributes (1 – ) P(b)

  18. The meaning of alignment scores Because , , are small, and ,  are very small, P(xi, yj) (1 – 2 – ) P(xi, yj) s(xi, yj) = log ––––––––– + log ––––––––––  log –––––––– + log(1 – 2) P(xi) P(yj) (1 – )2P(xi) P(yj) (1 –  – ) 1 –  d = – log ––––––––––––––––––  – log  –––––– (1 – ) (1 – 2 – ) 1 – 2  e = – log –––––––  – log  (1 – )

  19. The meaning of alignment scores • The Viterbi algorithm for Pair HMMs corresponds exactly to the Needleman-Wunschalgorithm with affine gaps • However, now we need to score alignment with parameters that add up to probability distributions •  1/mean length of next gap •  1/mean arrival time of next gap • affine gaps decouple arrival time with length •  1/mean length of aligned sequences (set to ~0) •  1/mean length of unaligned sequences (set to ~0)

  20. The meaning of alignment scores Match/mismatch scores: P(xi, yj) s(a, b)  log ––––––––––– (let’s ignore log(1 – 2) for the moment – assume no gaps) P(xi) P(yj) Example: Say DNA regions between human and mouse have average conservation of 50% Then P(A,A) = P(C,C) = P(G,G) = P(T,T) = 1/8 (so they sum to ½) P(A,C) = P(A,G) =……= P(T,G) = 1/24 (24 mismatches, sum to ½) Say P(A) = P(C) = P(G) = P(T) = ¼ log [ (1/8) / (1/4 * 1/4) ] = log 2 = 1, for match Then, s(a, b) = log [ (1/24) / (1/4 * 1/4) ] = log 16/24 = -0.585 Cutoff similarity that scores 0: s*1 – (1 – s)*0.585 = 0 According to this model, a 37.5%-conserved sequence with no gaps would score on average 0.375 * 1 – 0.725 * 0.585 = 0 Why? 37.5% is between the 50% conservation model, and the random 25% conservation model !

  21. Substitution matrices A more meaningful way to assign match/mismatch scores For protein sequences, different substitutions have dramatically different frequencies! PAM Matrices: • Start from a curated set of very similar protein sequences • Construct ancestral sequences (using parsimony) • Calculate Aab: frequency of letters a and b interchanging • Calculate Bab = P(b|a) = Aab/(c≤dAcd) • Adjust matrix B so that a,bqa qb Bab= 0.01 PAM(1) • Let PAM(N) = [PAM(1)]N -- Common PAM(250)

  22. Substitution Matrices BLOSUM matrices: • Start from BLOCKS database (curated, gap-free alignments) • Cluster sequences according to > X% identity • Calculate Aab: # of aligned a-b in distinct clusters, correcting by 1/mn, where m, n are the two cluster sizes • Estimate P(a) = (b Aab)/(c≤d Acd); P(a, b) = Aab/(c≤d Acd)

  23. BLOSUM matrices BLOSUM 50 BLOSUM 62 (The two are scaled differently)

More Related