380 likes | 504 Views
Markov Chains Lecture #5. Background Readings : Durbin et. al. Section 3.1, Polanski&Kimmel Section 2.8. Prepared by Shlomo Moran, based on Danny Geiger’s and Nir Friedman’s. Dependencies along the genome.
E N D
Markov ChainsLecture #5 Background Readings: Durbin et. al. Section 3.1, Polanski&Kimmel Section 2.8. Prepared by Shlomo Moran, based on Danny Geiger’s and Nir Friedman’s .
Dependencies along the genome • In previous classes we assumed every letter in a sequence is sampled randomly from some distribution q() over the alpha bet {A,C,T,G}. • This is too restrictive for true genomes. • There are special subsequences in the genome, like TATA within the regulatory area, upstream a gene. • The pattern CG is less common than expected for random sampling. • We model such dependencies by Markov chains and hidden Markov model, which we define next.
Finite Markov Chain • An integer time stochastic process, consisting of a domainD of m>1 states {s1,…,sm} and • An m dimensional initial distribution vector ( p(s1),.., p(sm)). • Anm×mtransition probabilities matrix M= (msisj) • For example, D can be the letters {A, C, T, G}, p(A) the probability of A to be the 1st letter in a sequence, and mAG the probability that G follows A in a sequence.
X1 X2 Xn-1 Xn • For each integer n, a Markov Chain assigns probability to sequences (x1…xn) inDn (i.e, xiD) as follows: Markov Chain (cont.)
X1 X2 Xn-1 Xn Markov Chain (cont.) Similarly, each Xi is a probability distributions over D, which is determined by the initial distribution (p1,..,pn) and the transition matrix M. There is a rich theory which studies the properties of such “Markov sequences”(X1,…, Xi ,…). A bit of this theory is presented next.
A B C D 0.95 0 0.05 0 A 0.2 0.5 0 0.3 B 0 0.2 0 0.8 C 0 0 1 0 D Matrix Representation The transition probabilities Matrix M =(mst) M is a stochastic Matrix: The initial distribution vector (u1…um)defines thedistributionof X1 (p(X1=si)=ui). After one move, the distribution is changed to X2 = X1M
A B C D 0.95 0 0.05 0 A 0.2 0.5 0 0.3 B 0 0.2 0 0.8 C 0 0 1 0 D Matrix Representation Example: if X1=(0, 1, 0, 0) then X2=(0.2, 0.5, 0, 0.3) And if X1=(0, 0, 0.5, 0.5) then X2=(0, 0.1, 0.5, 0.4). The i-th distribution is Xi= X1Mi-1 Basic question: what can we tell on the distributions Xi?
0.95 A B C D 0.95 0 0.05 0 0.2 0.5 A A B 0.2 0.5 0 0.3 B 0.2 0.3 0.05 0.8 0 0.2 0 0.8 D C C 1 0 0 1 0 D Representation of a Markov Chain as a Digraph Each directed edge AB is associated with the positive transition probability from A to B.
B A D C Properties of Markov Chain states States of Markov chains are classified by the digraph representation (omitting the actual probability values) A, C and D are recurrent states: they are in strongly connected components which are sinks in the graph. B is not recurrent – it is a transient state Alternative definitions: A state s is recurrent if it can be reached from any state reachable from s; otherwise it is transient.
A B D C Another example of Recurrent and Transient States A and B are transient states, C and D are recurrent states. Once the process moves from B to D, it will never come back.
E A B D C Irreducible Markov Chains A Markov Chain is irreducible if the corresponding graph is strongly connected (and thus all its states are recurrent). A B D C
E A B D C Periodic States A state s has a period k if k is the GCD of the lengths of all the cycles that pass via s. (in the shown graph the period of A is 2). Exercise: All the states in the same strongly connected component have the same period A Markov Chain is periodic if all the states in it have a period k >1. It is aperiodic otherwise.
Ergodic Markov Chains A B • A Markov chain is ergodic if : • the corresponding graph is strongly connected. • It is not peridoic D C Ergodic Markov Chains are important since they guarantee the corresponding Markovian process converges to a unique distribution, in which each state has a strictly positive probability.
Stationary Distributions for Markov Chains Let M be a Markov Chain of m states, and let V = (v1,…,vm) be a probability distribution over the m states V = (v1,…,vm) is stationary distribution for M if VM=V. (ie, if one step of the process does not change the distribution). V is a stationary distribution V is a non-negative left (row) Eigenvector of M with Eigenvalue 1.
Stationary Distributions for a Markov Chain M Exercise: A stochastic matrix always has a real left Eigenvector with Eigenvalue 1 (hint: show that a stochastic matrix has a right Eigenvector with Eigenvalue 1. Note that the left Eigenvalues of a Matrix are the same as the right Eigenvlues). It can be shown that the above Eigenvector V can be non-negative. Hence each Markov Chain has a stationary distribution.
“Good” Markov chains • A Markov Chain is good if the distributions Xi , as i∞: • Converge to a unique distribution, independent of the initial distribution. • (2) In that unique distribution, each state has a positive probability. • The Fundamental Theorem of Finite Markov Chains:A Markov Chain is good the corresponding graph is ergodic. • We will prove the part, by showing that non-ergodic Markov Chains are not good.
Examples of “Bad” Markov Chains • A Markov chain is “bad” if either : • It does not converge to a unique distribution (independent on the initial distribution). • It does converge to u.d., but some states in this distribution have zero probability.
B A D C Bad case 1: Mutual Unreachabaility • Consider two initial distributions: • p(X1=A)=1 (p(X1 = x)=0 if x≠A). • p(X1= C) = 1 In case a), the sequence will stay atA forever. In case b), it will stay in {C,D} for ever. Fact 1: If G has two states which are unreachable from each other, then {Xi} cannot converge to a distribution which is independent on the initial distribution.
A B D C Bad case 2: Transient States Once the process moves from B to D, it will never come back.
A B D C Bad case 2: Transient States Fact 2: For each initial distribution, with probability 1 a transient state will be visited only a finite number of times. X Proof: Let A be a transient state, and let X be the set of states from which A is unreachable. It is enough to show that, starting from any state, with probability 1 a state in X is reached after a finite number of steps (Exercise: complete the proof)
E A B D C Bad case 3: Periodic Markov Chains Recall: A Markov Chain is periodic if all the states in it have a period k >1. The above chain has period 2. In the above chain, consider the initial distribution p(B)=1. Then states {B, C} are visited (with positive probability) only in odd steps, and states {A, D, E} are visited in only even steps.
E A B D C Bad case 3: Periodic States Fact 3: In a periodic Markov Chain (of period k >1) there are initial distributions under which the states are visited in a periodic manner. Under such initial distributions Xidoes not converge as i∞. Corollary: A good Markov Chain is not periodic
The Fundamental Theorem of Finite Markov Chains: We have proved that non-ergodic Markov Chains are not good. A proof of the opposite direction below is based on Perron-Frobenius theory on nonnegative matrices, and is beyond the scope of this course: • If a Markov Chain is ergodic, then • It has a unique stationary distribution vector V > 0, which is a right (row) Eigenvector of the transition matrix. • For any initial distribution, the distributions Xi , as i∞, converges to V.
Use of Markov Chains in Genome search: Modeling CpG Islands In human genomes the pair CG often transforms to (methyl-C) G which often transforms to TG. Hence the pair CG appears less than expected from what is expected from the independent frequencies of C and G alone. Due to biological reasons, this process is sometimes suppressed in short stretches of genomes such as in the start regions of many genes. These areas are called CpG islands (-C-phosphate-G-).
Example: CpG Island (Cont.) We consider two questions: Question 1: Given a short stretch of genomic data, does it come from a CpG island ? Question 2: Given a long piece of genomic data, does it contain CpG islands in it, where, what length ? We “solve” the first question by modeling strings with and without CpG islands as Markov Chains over the same states {A,C,G,T} but different transition probabilities:
Example: CpG Island (Cont.) The “+” model: Use transition matrix M+ = (m+st), Where: m+st = (the probability that t follows s in a CpG island) The “-” model: Use transition matrix M- = (m-st), Where: m-st = (the probability that t follows s in a non CpG island)
Example: CpG Island (Cont.) With this model, to solve Question 1 we need to decide whether a given short sequence of letters is more likely to come from the “+” model or from the “–” model. This is done by using the definitions of Markov Chain, in which the parameters are determined by known data and the log odds-ratio test.
Question 1: Using two Markov chains M+ (For CpG islands): We need to specify p+(xi | xi-1) where + stands for CpG Island. From Durbin et al we have: Xi Xi-1 (Recall: rows must add up to one; columns need not.)
Question 1: Using two Markov chains M- (For non-CpG islands): …and for p-(xi | xi-1) (where “-” stands for Non CpG island) we have: Xi Xi-1
X1 X2 XL-1 XL Discriminating between the two models Given a string x=(x1….xL), now compute the ratio If RATIO>1, CpG island is more likely. Actually – the log of this ratio is computed: Note: p+(x1|x0) is defined for convenience as p+(x1). p-(x1|x0) is defined for convenience as p-(x1).
Log Odds-Ratio test Taking logarithm yields If logQ > 0, then + is more likely (CpG island). If logQ < 0, then - is more likely (non-CpG island).
Where do the parameters (transition- probabilities) come from ? Learning from complete data, namely, when the label is given and every xi is measured: Source: A collection of sequences from CpG islands, and a collection of sequences from non-CpG islands. Input: Tuples of the form (x1, …, xL, h), where h is + or - Output: Maximum Likelihood parameters (MLE) Count all pairs (Xi=a, Xi-1=b) with label +, and with label -, say the numbers are Nba,+ and Nba,- .
X1 X2 XL-1 XL Where Nbx,+ is the number of times letter x appears after letter b in CpG islands in the dataset. Maximum Likelihood Estimate (MLE) of the parameters (using labeled data) The needed parameters are: P+(x1), p+ (xi | xi-1), p-(x1), p-(xi | xi-1) The ML estimates are given by: Where Nx,+ is the number of times letter x appear in CpG islands in the dataset. Using MLE is justified when we have a large sample. The numbers appearing in our tables (taken from Durbin et al p. 50) are based on 60,000 nucleotides.
Question 2: Finding CpG Islands Given a long genomic string with possible CpG Islands, we define a Markov Chain over 8 states, all interconnected (hence it is ergodic): The problem is that we don’t know the sequence of states which are traversed, but just the sequence of letters. A+ C+ G+ T+ A- C- G- T- Therefore we use here Hidden Markov Model
M M M M S1 S2 SL-1 SL T T T T x1 x2 XL-1 xL Hidden Markov Model A Markov chain (s1,…,sL): and for each state s and a symbol x we have p(Xi=x|Si=s) Application in communication: message sent is (s1,…,sm) but we receive (x1,…,xm) . Compute what is the most likely message sent. Application in speech recognition: word said is (s1,…,sm) but we recorded (x1,…,xm) . Compute what is the most likely word said.
M M M M S1 S2 SL-1 SL T T T T x1 x2 XL-1 xL Hidden Markov Model Notations: Markov Chain transition probabilities: p(Si+1= t|Si= s) = ast Emission probabilities: p(Xi = b| Si = s) = es(b) For Markov Chains we know: What is p(s,x) = p(s1,…,sL;x1,…,xL)?
M M M M S1 S2 SL-1 SL T T T T x1 x2 XL-1 xL Hidden Markov Model p(Xi = b| Si = s) = es(b), means that the probability of xidepends only on the probability of si. Formally, this is equivalent to the conditional independence assumption: p(Xi=xi|x1,..,xi-1,xi+1,..,xL,s1,..,si,..,sL) = esi(xi) Thus