1 / 17

CSE182-L10

CSE182-L10. HMM applications. Probability of being in specific states. What is the probability that we were in state k at step I? Pr[All paths that passed through state k at step I, and emitted x] Pr[All paths that emitted x]. The Forward Algorithm.

Download Presentation

CSE182-L10

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE182-L10 HMM applications

  2. Probability of being in specific states • What is the probability that we were in state k at step I? Pr[All paths that passed through state k at step I, and emitted x] Pr[All paths that emitted x]

  3. The Forward Algorithm • Recall v[i,j] : Probability of the most likely path the automaton chose in emitting x1…xi, and ending up in state j. • Define f[i,j]: Probability that the automaton started from state 1, and emitted x1…xi • What is the difference? x1…xi

  4. Most Likely path versus Probability of Arrival • There are multiple paths from states 1..j in which the automaton can output x1…xi • In computing the viterbi path, we choose the most likely path • V[i,j] = maxπ Pr[x1…xi|π] • The probability of emitting x1…xi and ending up in state j is given by • F[i,j] = ∑π Pr[x1…xi|π]

  5. The Forward Algorithm • Recall that • v(i,j) = max lQ {v(i-1,l).A[l,j] }.ej(xi) • Instead • F(i,j) = ∑lQ (F(i-1,l).A[l,j] ).ej(xi) 1 j

  6. The Backward Algorithm • Define b[i,j]: Probability that the automaton started from state i, emitted xi+1…xn and ended up in the final state xi+1…xn x1…xi 1 m i

  7. Forward Backward Scoring • F(i,j) = ∑lQ (F(i-1,l).A[l,j] ).ej(xi) • B[i,j] = ∑lQ (A[j,l].el(xi+1) B(i+1,l)) • Pr[x,πi=k]=F(i,k) B(i,k)

  8. 1 2 3 4 5 6 7 8 A C G T 0.9 0.4 0.3 0.6 0.1 0.0 0.2 1.0 0.0 0.2 0.7 0.0 0.3 0.0 0.0 0.0 0.1 0.2 0.0 0.0 0.3 1.0 0.3 0.0 0.0 0.2 0.0 0.4 0.3 0.0 0.5 0.0 Application of HMMs • How do we modify this to handle indels?

  9. Applications of the HMM paradigm • Modifying Profile HMMs to handle indels • States Ii: insertion states • States Di: deletion states 1 2 3 4 5 6 7 8 A C G T 0.9 0.4 0.3 0.6 0.1 0.0 0.2 1.0 0.0 0.2 0.7 0.0 0.3 0.0 0.0 0.0 0.1 0.2 0.0 0.0 0.3 1.0 0.3 0.0 0.0 0.2 0.0 0.4 0.3 0.0 0.5 0.0

  10. Profile HMMs • An assignment of states implies insertion, match, or deletion. EX: ACACTGTA 1 2 3 4 5 6 7 8 A C G T 0.9 0.4 0.3 0.6 0.1 0.0 0.2 1.0 0.0 0.2 0.7 0.0 0.3 0.0 0.0 0.0 0.1 0.2 0.0 0.0 0.3 1.0 0.3 0.0 0.0 0.2 0.0 0.4 0.3 0.0 0.5 0.0 C A A C T G T A

  11. Viterbi Algorithm revisited • Define vMj (i)as the log likelihood score of the best path for matching x1..xi to profile HMM ending with xi emitted by the state Mj. • vIj(i)andvDj(i)are defined similarly.

  12. Viterbi Equations for Profile HMMs vMj-1(i-1) + log(A[Mj-1, Mj]) vMj(i) = log (eMj(xi)) + max vIj-1(i-1) + log(A[Ij-1, Mj]) vDj-1(i-1) + log(A[Dj-1, Mj]) vMj(i-1) + log(A[Mj-1, Ij]) vIj(i) = log (eIj(xi)) + max vIj(i-1) + log(A[Ij-1, Ij]) vDj(i-1) + log(A[Dj-1, Ij])

  13. Compositional Signals • CpG islands. In genomic sequence, the CG di-nucleotide is rarely seen • CG helps methylation of C, and subsequent mutation to T. • In regions around a gene, the methylation is suppressed, and therefore CG is more common. • CpG islands: Islands of CG on the genome. • How can you detect CpG islands?

  14. An HMM for Genomic regions • Node A emits A with Prob. 1, and 0 for all other bases. • The start and end node do not emit any symbol. • All outgoing edges from nodes are equi-probable, except for the ones coming out of C. A G .25 0.1 end start C 0.4 T .25

  15. An HMM for CpG islands • Node A emits A with Prob. 1, and 0 for all other bases. • The start and end node do not emit any symbol. • All outgoing edges from nodes are equi-probable, except for the ones coming out of C. A G 0.25 0.25 end start C 0.25 T

  16. HMM for detecting CpG Islands A B A G A 0.1 end G start end C start 0.4 T C T • In the best parse of a genomic sequence, each base is assigned a state from the sets A, and B. • Any substring with multiple states coming from B can be described as a CpG island.

  17. HMM: Summary • HMMs are a natural technique for modeling many biological domains. • They can capture position dependent, and also compositional properties. • HMMs have been very useful in an important Bioinformatics application: gene finding.

More Related