Markov Chains

Markov Chains

Hidden Markov Models

Review • Markov Chain can solve the CpG island finding problem • Positive model, negative model • Length? Solution: Using a combined model

Hidden Markov Models • The essential difference between a Markov chain and a hidden Markov model is that for a hidden Markov model there is not a one-to-one correspondence between the states and the symbols (Why Hidden?). • It is no longer possible to tell what state the model was in when xi was generated just by looking at xi. • In the previous example, there is no way to tell by looking at a single symbol C in isolation whether it was emitted by state C+ or C-. • Many states to one letter. • Many letters to one state. • We now have to distinguish the sequence of states from the sequences of symbols

Hidden Markov Models • States • A path of state: p=p1, p2,…,pn • Observable symbols • A, C, G, T • X=x1,x2,…,xn • Transition probabilities • akl=P(pi=l|pi-1=k) • Emission probabilities • ek(b)=P(xi=b|pi=k) • Decouple states and observable symbols

Hidden Markov Models • We can think of a HMM as a generative model, that generates or emits sequences. • First state p1 is selected (either randomly or according to some prior probabilities), then symbol x1 is emitted at state p1 with possibility e1(x1). Then it transits to state p2 with possibility a12 , etc.

Hidden Markov Models • X: G C A T A G C G G C T A G C T G A A T A G G A … • P:G+C+A+T+A+G+C+G+G+C+T+A+G+C+T-G-A-A-T-A-G-G-A- … • Now it is the path of hidden states that we want to find out • Many paths can be used to generate X, we want to find out the most likely one. • There are several ways to do this • Brute Force method • Dynamic programming • We will talk about them later

The occasionally dishonest casino • A casino uses a fair die most of the time, but occasionally switches to a loaded one • Fair die: Prob(1) = Prob(2) = . . . = Prob(6) = 1/6 • Loaded die: Prob(1) = Prob(2) = . . . = Prob(5) = 1/10, Prob(6) = ½ • These are the emission probabilities at the two states, loaded and fair. • Transition probabilities • Prob(Fair  Loaded) = 0.01 • Prob(Loaded Fair) = 0.2 • Transitions between states obey a Markov process

A HMM for the occasionally dishonest casino

The occasionally dishonest casino • The casino won’t tell you when they use the fair or loaded die. • Known: • The structure of the model • The transition probabilities • Hidden: What the casino did • FFFFFLLLLLLLFFFF... • Observable: The series of die tosses • 3415256664666153... • What we must infer: • When the fair die was used? • When the loaded die was used? • The answer is a sequence of statesFFFFFFFLLLLLLFFF...

Making the inference • Model assigns a probability to each explanation of the observation: P(326|FFL) = P(3|F)·P(FF)·P(2|F)·P(FL)·P(6|L) = 1/6 · 0.99 · 1/6 · 0.01 · ½

Notation • x isthe sequence of symbols emitted by model • xi is the symbol emitted at time i • A path, , is a sequence of states • The i-th state in  is i • akr is the probability of making a transition from state k to state r: • ek(b) is the probability that symbol b is emitted when in state k

0 0 1 1 1 1 … 2 2 2 2 … … … … … K K K K … A path of a sequence 1 2 2 K x1 x2 x3 xL

The occasionally dishonest casino

The most probable path The most likely path * satisfies To find *, consider all possible ways the last symbol of x could have been emitted Let Then

The Viterbi Algorithm • Viterbi Algorithm is a dynamic programming algorithm for finding the most likely sequence of hidden states – called Veterbi Path – that results in a sequence of observed symbols • Assumptions: • Both the observed symbols and hidden states must be in a sequence • These two sequences need to be aligned, and an observed symbol needs to correspond to exactly one hidden state • Computing the most likely sequence of hidden states (path) up to a certain point t must depend only on the observed symbol at point t , and the most likely sequence of hidden states (path) up to point t − 1 • These assumptions are all satisfied in a first-order hidden Markov model.

The Viterbi Algorithm • Initialization (i = 0) • Recursion (i = 1, . . . , L): For each state k • Termination: To find *, use trace-back(i=L…1), as in dynamic programming

Viterbi: Example x 2 6  6 0 0 B 1 0 (1/6)max{(1/12)0.99, (1/4)0.2} = 0.01375 (1/6)max{0.013750.99, 0.020.2} = 0.00226875 (1/2)(1/6) = 1/12 0 F  (1/2)max{0.013750.01, 0.020.8} = 0.08 (1/10)max{(1/12)0.01, (1/4)0.8} = 0.02 (1/2) (1/2) = 1/4 0 L

Viterbi gets it right more often than not

Hidden Markov Models

Total probability Many different paths can result in observation x. The probability that our model will emit x is Total Probability If HMM models a family of objects, we want total probability to peak at members of the family. (Training)

å f ( i ) e ( x ) f ( i 1 ) a = - r i k k rk r Total probability Pr(x) can be computed in the same way as probability of most likely path. Let Then and

The Forward Algorithm • Initialization (i = 0) • Recursion (i = 1, . . . , L): For each state k • Termination:

Hidden Markov Models • Decoding • Viterbi: Maximum Likelihood: Determine which explanation is most likely • Find the path most likely to have produced the observed sequence • Forward: Total probability: Determine probability that observed sequence was produced by the HMM • Consider all paths that could have produced the observed sequence • Forward and Backward: the probability that xi came from state k given the observed sequence, i.e. P(pi=k|x)

The Backward Algorithm Pr(x) can be computed in the same way as probability of most likely path. Let Then i=L-1, ...,1 and

The Backward Algorithm • Initialization (i= L)bk(L)=ako for all k • Recursion (i = L-1, . . . , 1): For each state • Termination:

Posterior state probabilities • The probability that xi came from state k given the observed sequence, i.e. P(pi=k|x) • P(x,pi=k)=P(x1…xi,pi=k) P(xi+1…xL|x1…xi, pi=k) =P(x1…xi,pi=k) P(xi+1…xL| pi=k) =fk(i) bk(i) • P(pi=k|x)=fk(i)bk(i)/P(x) • Posterior decoding: Assign xi the state k that maximize P(pi=k|x)=fk(i)bk(i)/P(x)

Estimating the probabilities

Estimating the probabilities (“training”) • Baum-Welch algorithm • Start with initial guess at transition probabilities • Refine guess to improve the total probability of the training data in each step • May get stuck at local optimum • Special case of expectation-maximization (EM) algorithm • Viterbi training • Derive probable paths for training data using Viterbi algorithm • Re-estimate transition probabilities based on Viterbi path • Iterate until paths stop changing

Markov Chains

Markov Chains

Presentation Transcript

11 - Markov Chains

Markov Chains

Markov Chains

Markov Chains

Markov Chains

Markov Chains

Markov Chains

Markov Chains

Eager Markov Chains

Markov chains

Markov chains

Distributed Markov Chains

Markov Chains

Markov Chains Regular Markov Chains Absorbing Markov Chains

Markov Chains and Hidden Markov Models

Markov Chains

Markov Chains

Tutorial: Markov Chains

Markov Chains

Markov Chains

Markov chains

Markov Chains