310 likes | 403 Views
Class 5: Hidden Markov Models. Sequence Models. So far we examined several probabilistic model sequence models These model, however, assumed that positions are independent This means that the order of elements in the sequence did not play a role
E N D
Sequence Models • So far we examined several probabilistic model sequence models • These model, however, assumed that positions are independent • This means that the order of elements in the sequence did not play a role • In this class we learn about probabilistic models of sequences
Probability of Sequences • Fix an alphabet • Let X1,…,Xn be a sequence of random variables over • We want to model P(X1,…,Xn)
Markov Chains Assumption: • Xi+1 is independent of the past once we know Xi This allows us to write:
Markov Chains (cont) Assumption: • P(Xi+1|Xi) is the same for all i Notation P(Xi+1=b |Xi=a ) = Aab • By specifying the matrix A and initial probabilities, we define P(X1,…,Xn) • To avoid the special case of P(X1), we can use a special start state, and denote P(X1 = a) = Asa
Example: CpG islands • In human genome, CpG dinucleotides are relatively rare • CpG pairs undergo a process called methylation that modifies the C nucleotide • A methylated C can (with relatively high chance) mutate to a T • Promotor regions are CpG rich • These regions are not methylated, and thus mutate less often • These are called CpG islands
CpG Islands • We construct Markov chain for CpG rich and poor regions • Using maximum likelihood estimates from 60K nucleotide, we get two models
Ratio Test for CpG islands • Given a sequence X1,…,Xnwe compute the likelihood ratio
Finding CpG islands Simple Minded approach: • Pick a window of size N(N = 100, for example) • Compute log-ratio for the sequence in the window, and classify based on that Problems: • How do we select N? • What do we do when the window intersects the boundary of a CpG island?
Alternative Approach • Build a model that include “+” states and “-” states • A state “remembers” last nucleotide and the type of region • A transition from a - state to a + describes a start of CpG island
Hidden Markov Models Two components: • A Markov chain of hidden statesH1,…,Hn with L values • P(Hi+1=k |Hi=l ) = Akl • ObservationsX1,…,Xn • Assumption: Xidepends only on hidden state Hi • P(Xi=a |Hi=k ) = Bka
Computing Most Probable Sequence Given:x1,…,xn Output: h*1,…,h*n such that
Idea: • If we know the value of hi, then the most probable sequence on i+1,…,n does not depend on observations before time i • Let Vi(l) be the probability of the best sequence h1,…,hi such that hi = l
Viterbi Algorithm • Set V0(0) = 1, V0(l) = 0 for l > 0 • for i= 1, …, n • for l = 1,…,L • set • Let h*n = argmaxl Vn(l) • for i = n-1,…,1 • set h*i = Pi+1(h*i+1)
Computing Probabilities Given:x1,…,xn Output: P(x*1,…,x*n ) How do we sum of exponential number of hidden sequences?
Forward Algorithm • Perform dynamic programming on sequences • Let fi(l) = P(x1,…,xi,Hi=l) • Recursion rule: • Conclusion
Backward Algorithm • Perform dynamic programming on sequences • Let bi(l) = P(xi+1,…,xn|Hi=l) • Recursion rule: • Conclusion
Computing Posteriors • How do we compute P(Hi | x1,…,xn) ?
Dishonest Casino (again) • Computing posterior probabilities for “fair” at each point in a long sequence:
Learning Given a sequence x1,…,xn, h1,…,hn • How do we learn Akl and Bka ? • We want to find parameters that maximize the likelihoodP(x1,…,xn,h1,…,hn) We simply count: • Nkl - number of times hi=k & hi+1=l • Nka - number of times hi=k & xi = a
Learning Given only sequence x1,…,xn • How do we learn Akl and Bka ? • We want to find parameters that maximize the likelihoodP(x1,…,xn) Problem: • Counts are inaccessible since we do not observe hi
Expected Counts • We can compute expected number of times hi=k & hi+1=l • Similarly
Expectation Maximization (EM) • Choose Akl and Bka E-step: • Compute expected counts E[Nkl], E[Nka] M-Step: • Restimate: • Reiterate
EM - basic properties • P(x1,…,xn: Akl, Bka) P(x1,…,xn: A’kl, B’ka) • Likelihood grows in each iteration • If P(x1,…,xn: Akl, Bka) = P(x1,…,xn: A’kl, B’ka)then Akl, Bka is a stationary point of the likelihood • either a local maxima, minima, or saddle point
Complexity of E-step • Compute forward and backward messages • Time & Space complexity: O(nL) • Accumulate expected counts • Time complexity O(nL2) • Space complexity O(L2)
EM - problems Local Maxima: • Learning can get stuck in local maxima • Sensitive to initialization • Require some method for escaping such maxima Choosing L • We often do not know how many hidden values we should have or can learn