1 / 30

Markov Chains

Markov Chains. Hidden Markov Models. Review. Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution: Using a combined model. Hidden Markov Models.

lixue
Download Presentation

Markov Chains

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Markov Chains

  2. Hidden Markov Models

  3. Review • Markov Chain can solve the CpG island finding problem • Positive model, negative model • Length? Solution: Using a combined model

  4. Hidden Markov Models • The essential difference between a Markov chain and a hidden Markov model is that for a hidden Markov model there is not a one-to-one correspondence between the states and the symbols (Why Hidden?). • It is no longer possible to tell what state the model was in when xi was generated just by looking at xi. • In the previous example, there is no way to tell by looking at a single symbol C in isolation whether it was emitted by state C+ or C-. • Many states to one letter. • Many letters to one state. • We now have to distinguish the sequence of states from the sequences of symbols

  5. Hidden Markov Models • States • A path of state: p=p1, p2,…,pn • Observable symbols • A, C, G, T • X=x1,x2,…,xn • Transition probabilities • akl=P(pi=l|pi-1=k) • Emission probabilities • ek(b)=P(xi=b|pi=k) • Decouple states and observable symbols

  6. Hidden Markov Models • We can think of a HMM as a generative model, that generates or emits sequences. • First state p1 is selected (either randomly or according to some prior probabilities), then symbol x1 is emitted at state p1 with possibility e1(x1). Then it transits to state p2 with possibility a12 , etc.

  7. Hidden Markov Models • X: G C A T A G C G G C T A G C T G A A T A G G A … • P:G+C+A+T+A+G+C+G+G+C+T+A+G+C+T-G-A-A-T-A-G-G-A- … • Now it is the path of hidden states that we want to find out • Many paths can be used to generate X, we want to find out the most likely one. • There are several ways to do this • Brute Force method • Dynamic programming • We will talk about them later

  8. The occasionally dishonest casino • A casino uses a fair die most of the time, but occasionally switches to a loaded one • Fair die: Prob(1) = Prob(2) = . . . = Prob(6) = 1/6 • Loaded die: Prob(1) = Prob(2) = . . . = Prob(5) = 1/10, Prob(6) = ½ • These are the emission probabilities at the two states, loaded and fair. • Transition probabilities • Prob(Fair  Loaded) = 0.01 • Prob(Loaded Fair) = 0.2 • Transitions between states obey a Markov process

  9. A HMM for the occasionally dishonest casino

  10. The occasionally dishonest casino • The casino won’t tell you when they use the fair or loaded die. • Known: • The structure of the model • The transition probabilities • Hidden: What the casino did • FFFFFLLLLLLLFFFF... • Observable: The series of die tosses • 3415256664666153... • What we must infer: • When the fair die was used? • When the loaded die was used? • The answer is a sequence of statesFFFFFFFLLLLLLFFF...

  11. Making the inference • Model assigns a probability to each explanation of the observation: P(326|FFL) = P(3|F)·P(FF)·P(2|F)·P(FL)·P(6|L) = 1/6 · 0.99 · 1/6 · 0.01 · ½

  12. Notation • x isthe sequence of symbols emitted by model • xi is the symbol emitted at time i • A path, , is a sequence of states • The i-th state in  is i • akr is the probability of making a transition from state k to state r: • ek(b) is the probability that symbol b is emitted when in state k

  13. 0 0 1 1 1 1 … 2 2 2 2 … … … … … K K K K … A path of a sequence 1 2 2 K x1 x2 x3 xL

  14. The occasionally dishonest casino

  15. The most probable path The most likely path * satisfies To find *, consider all possible ways the last symbol of x could have been emitted Let Then

  16. The Viterbi Algorithm • Viterbi Algorithm is a dynamic programming algorithm for finding the most likely sequence of hidden states – called Veterbi Path – that results in a sequence of observed symbols • Assumptions: • Both the observed symbols and hidden states must be in a sequence • These two sequences need to be aligned, and an observed symbol needs to correspond to exactly one hidden state • Computing the most likely sequence of hidden states (path) up to a certain point t must depend only on the observed symbol at point t , and the most likely sequence of hidden states (path) up to point t − 1 • These assumptions are all satisfied in a first-order hidden Markov model.

  17. The Viterbi Algorithm • Initialization (i = 0) • Recursion (i = 1, . . . , L): For each state k • Termination: To find *, use trace-back(i=L…1), as in dynamic programming

  18. Viterbi: Example x 2 6  6 0 0 B 1 0 (1/6)max{(1/12)0.99, (1/4)0.2} = 0.01375 (1/6)max{0.013750.99, 0.020.2} = 0.00226875 (1/2)(1/6) = 1/12 0 F  (1/2)max{0.013750.01, 0.020.8} = 0.08 (1/10)max{(1/12)0.01, (1/4)0.8} = 0.02 (1/2) (1/2) = 1/4 0 L

  19. Viterbi gets it right more often than not

  20. Hidden Markov Models

  21. Total probability Many different paths can result in observation x. The probability that our model will emit x is Total Probability If HMM models a family of objects, we want total probability to peak at members of the family. (Training)

  22. å f ( i ) e ( x ) f ( i 1 ) a = - r i k k rk r Total probability Pr(x) can be computed in the same way as probability of most likely path. Let Then and

  23. The Forward Algorithm • Initialization (i = 0) • Recursion (i = 1, . . . , L): For each state k • Termination:

  24. Hidden Markov Models • Decoding • Viterbi: Maximum Likelihood: Determine which explanation is most likely • Find the path most likely to have produced the observed sequence • Forward: Total probability: Determine probability that observed sequence was produced by the HMM • Consider all paths that could have produced the observed sequence • Forward and Backward: the probability that xi came from state k given the observed sequence, i.e. P(pi=k|x)

  25. The Backward Algorithm Pr(x) can be computed in the same way as probability of most likely path. Let Then i=L-1, ...,1 and

  26. The Backward Algorithm • Initialization (i= L)bk(L)=ako for all k • Recursion (i = L-1, . . . , 1): For each state • Termination:

  27. Posterior state probabilities • The probability that xi came from state k given the observed sequence, i.e. P(pi=k|x) • P(x,pi=k)=P(x1…xi,pi=k) P(xi+1…xL|x1…xi, pi=k) =P(x1…xi,pi=k) P(xi+1…xL| pi=k) =fk(i) bk(i) • P(pi=k|x)=fk(i)bk(i)/P(x) • Posterior decoding: Assign xi the state k that maximize P(pi=k|x)=fk(i)bk(i)/P(x)

  28. Estimating the probabilities

  29. Estimating the probabilities (“training”) • Baum-Welch algorithm • Start with initial guess at transition probabilities • Refine guess to improve the total probability of the training data in each step • May get stuck at local optimum • Special case of expectation-maximization (EM) algorithm • Viterbi training • Derive probable paths for training data using Viterbi algorithm • Re-estimate transition probabilities based on Viterbi path • Iterate until paths stop changing

More Related