1 / 28

Bioinformatics

Lecture 5 Hidden Markov Model. Bioinformatics. Dr. Aladdin Hamwieh Khalid Al- shamaa Abdulqader Jighly. Aleppo University Faculty of technical engineering Department of Biotechnology. 2010-2011. Gene prediction: Methods. Gene Prediction can be based upon: Coding statistics

becca
Download Presentation

Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 5 Hidden Markov Model Bioinformatics Dr. Aladdin Hamwieh Khalid Al-shamaa Abdulqader Jighly Aleppo University Faculty of technical engineering Department of Biotechnology 2010-2011

  2. Gene prediction: Methods • Gene Prediction can be based upon: • Coding statistics • Gene structure • Comparison Statistical approach Similarity-based approach

  3. Gene prediction: Methods • Gene Prediction can be based upon: • Coding statistics • Gene structure • Comparison Statistical approach Similarity-based approach

  4. Gene prediction: Coding statistics • Coding regions of the sequence have different properties than non-coding regions: non random properties of coding regions. • CG content • Codon bias (CODON USAGE).

  5. Markov Model

  6. Markov Model • A Markov model is a process, which moves from state to state depending (only) on the previous n states. • For example, calculating the probability of getting this weather sequence states in one week from march: Sunny, Sunny, Cloudy, Rainy, Rainy, Sunny, Cloudy. • If today is Cloudy, it would be more appropriate to be Rainy tomorrow • On march it’s more appropriate to start with a Sunny day more than other situations • And so on.

  7. Cloudy Sunny Rainy Markov Model 0.25 0.25 0.625 0.5 0.25 0.375 0.375 0.25 0.125 Weather tomorrow Sunny cloudy Rainy Sunny Cloudy Rainy Sunny Cloudy Rainy Weather today

  8. Cloudy Sunny Rainy Example: Σ = P (Sunny , Sunny, Cloudy, Rainy | Model) = Π(sunny)* P (Sunny | Sunny) * P (Cloudy | Sunny) *P (Rainy | Cloudy) = 0.6 * 0.5 * 0.25 * 0.375 = 0.0281 0.25 0.25 0.625 0.5 0.25 0.375 0.375 0.25 0.125 Weather tomorrow Sunny cloudy Rainy Sunny Cloudy Rainy Sunny Cloudy Rainy Weather today

  9. Hidden Markov Models • States are not observable • Observations are probabilistic functions of state • State transitions are still probabilistic

  10. CG Islands and the “Fair Bet Casino” • The CG islands problem can be modeled after a problem named “The Fair Bet Casino” • The game is to flip coins, which results in only two possible outcomes: Head or Tail. • The Fair coin will give Heads and Tails with same probability ½. • The Biased coin will give Heads with prob. ¾.

  11. The “Fair Bet Casino” (cont’d) • Thus, we define the probabilities: • P(H|F) = P(T|F) = ½ • P(H|B) = ¾, P(T|B) = ¼ • The crooked dealer chages between Fair and Biased coins with probability 10%

  12. HMM for Fair Bet Casino (cont’d) HMM model for the Fair Bet Casino Problem

  13. HMM Parameters Σ: set of emission characters. Ex.: Σ = {H, T} for coin tossing Σ = {1, 2, 3, 4, 5, 6} for dice tossing Σ = {A, C, G, T} for DNA sequence Q: set of hidden states, each emitting symbols from Σ. Q={F,B} for coin tossing Q={Non-coding, Coding, Regulatory} for sequences

  14. HMM Parameters (cont’d) A = (akl): a |Q| x |Q| matrix of probability of changing from state k to state l. aFF = 0.9 aFB = 0.1 aBF = 0.1 aBB = 0.9 E = (ek(b)): a |Q| x |Σ| matrix of probability of emitting symbol b while being in state k. eF(0) = ½ eF(1) = ½ eB(0) = ¼ eB(1) = ¾

  15. HMM Yellow Red Green Blue Yellow Red Green Blue Yellow Red Green Blue Q1 Q2 Q3 i+1 turn Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3 ithturn

  16. The three Basic problems of HMMs Problem 1: Given observation sequence Σ=O1O2…OTand model M=(Π, A, E). Compute P(Σ | M). Problem 2: Given observation sequence Σ=O1O2…OT and model M=(Π, A, E) how do we choose a corresponding state sequence Q=q1q2…qT,which best “explains” the observation. Problem 3: How do we adjust the model parameters Π, A, E to maximize P(Σ |{Π, A, E})?

  17. The three Basic problems of HMMs Problem 1: Given observation sequence Σ=O1O2…OTand model M=(Π, A, E) compute P(Σ | M). for example: P ( | M)

  18. Problem 1: Probability of an Observation Sequence • What is ? • The probability of a observation sequence is the sum of the probabilities of all possible state sequences in the HMM. • Naive computation is very expensive. Given T observations and N states, there are NT possible state sequences. • Even small HMMs, e.g. T=10 and N=10, contain 10 billion different paths • Solution to this and problem 2 is to use dynamic programming

  19. Q2 Q2 Q2 Q3 Q3 Q3 Q3 Q2 Q1 Q1 Q1 Q1 Problem 1: Given observation sequence Σ=O1O2…OT and model M=(Π, A, E) compute P(Σ | M). Solution:Forward algorithm Yellow Red Green Blue Example: P( | M). Yellow Red Green Blue Yellow Red Green Blue 0.15 * 0.1 * 0.25 = 0.00375 0.03 * 0.4 * 0.1 = 0.0012 0.065 * 0.2 * 0.65 = 0.00845 Sum= 0.0134 0.15 0.0134 Q1 Q2 Q3 * 0.1 * 0.25 0.25 * * 0.4 * 0.1 0.6 0.03 0.1 * Q1 Q2 Q3 0.3 * 0.2 * 0.65 Q1 Q2 Q3 0.1 0.065 * 0.65

  20. Q? Q? Q? Q? The three Basic problems of HMMs Problem 2: Given observation sequence Σ=O1O2…OTand model M=(Π, A, E) how do we choose a corresponding state sequence Q=q1q2…qT,which best “explains” the observation. For example: What are most probable Q1Q2Q3Q4 given the observation

  21. Problem 2: Decoding • The solution to Problem 1 gives us the sum of all paths through an HMM efficiently. • For Problem 2, we want to find the path with the highest probability.

  22. Q2 Q2 Q2 Q3 Q3 Q3 Q3 Q2 Q1 Q1 Q1 Q1 Yellow Red Green Blue Example: P( | M). 0.15 * 0.1 * 0.25 = 0.00375 0.03 * 0.4 * 0.1 = 0.0012 0.065 * 0.2 * 0.65 = 0.00845 THE LARGEST Yellow Red Green Blue Yellow Red Green Blue 0.15 0.00845 * 0.1 * 0.25 0.25 Q1 Q2 Q3 * * 0.4 * 0.1 0.6 0.03 0.1 * 0.3 * 0.2 * 0.65 Q1 Q2 Q3 0.1 0.065 * 0.65 Q1 Q2 Q3

  23. Hidden Markov Model ANDGene Prediction

  24. How is it connected to Gene prediction? Yellow Red Green Blue Yellow Red Green Blue Yellow Red Green Blue Q1 Q2 Q3 i+1 turn Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3 ith turn

  25. How is it connected to Gene prediction? T T G A G T G G A A T C T A G C C C C A G A G C T T A A G C T A G C T A G C T Exon Intron UTR

  26. B S D A T F Hidden states 3‘: 3‘ UTR EI: Initial Exon SE: Single Exon I: Intron E: Exon FE: Final Exon 5‘: 5‘ UTR Hidden Markov Models (HMM) for gene prediction • Basic probabilistic model of gene structure. E 5‘ IE FE 3‘ I SE Signals B: Begin sequence S: Start translation A: acceptor site (AG) D: Donor site (GT) T: Stop translation F: End sequence

  27. Eukaryotic Genes Features Hand Over TAG GT AG TG TAG ATG ATG  ATG  GT  AG  AG  TAG 

  28. Thank you

More Related