440 likes | 615 Views
Models for DNA substitution. http://www.stat.rice.edu/ ~mathbio/Polanski/stat655 /. Plan. Basics Models in discrete time Model is continuous time Parameter estimation. Nucleotides. Adenine ( A ) or ( a ) Guanine ( G ) or ( g ) Cytosine ( C ) or ( c ) Thymine ( T ) or ( t ) . purines.
E N D
Plan • Basics • Models in discrete time • Model is continuous time • Parameter estimation
Nucleotides • Adenine ( A ) or ( a ) • Guanine ( G ) or ( g ) • Cytosine ( C ) or ( c ) • Thymine ( T ) or ( t ) purines pyrimidines
Substitution Purine Purine Transitions Pyrimidine Pyrimidine AG, G A, C T, T C Purine Pyrimidine Pyrimidine Purine Transversions AT, T A, A C, C A GT, T G, G C, C G
Other Deletions, insertions Insertions in reverse order
Hypothesis Substitution of nucleotides in the evolution of DNA sequences can be modeled by a Markov chain or Markov process
Other assumptions • Stationarity • Reversibility
Transition matrix g c t a pac pat pag paa a pgt pga pgg pgc g P = pca pcg pcc pct c t pta ptg ptc ptt
Jukes – Cantor model All substitutions are equally probable
Remark • When learning and researching Markov models for nucleotide substitution, it greatly helps to use a software for symbolic computation, like Mathematica, Maple, Scientific Workplace.
Kimura models • - probability of a transition • - probability of a specific transversion
Kimura 3ST model • - probability of : AG, C T • - probability of : AC, G T • - probability of : AT, C G
Generalizations of Kimura models By Ewens: - probability of : AG, C T - probability of : AC, A T, G C, G T - probability of : CA, T A, C G, T G
By Blaisdell: - probability of : AG, CT - probability of : GA, TC - probability of : AC, A T, G C, G T - probability of : CA, T A, C G, T G
Stationary distribution where Remark: this model is not reversible
Felsenstein model Probability of substitution of any nucleotide by another is proportional to the stationary probability of the substituting nucleotide
HKY model Hasegawa, Kishino, Yano Different rates for transitions and transversions
General 12 parameter model Tavare, 1986
Reversibility A=D, B=G, C=J, E=H, F=K, I=L Conclusion – the most general reversible model has 12 – 6 = 6 free parameters
Matrix of transition probabilites Q – intensity matrix
Jukes – Cantor model Three things are equivalent due to reversibility: Ancestor (A) A D2 D1 D1 D2 A D1 D2
Probability that the nucleotides are different in two descendants
Estimating p We have two DNA sequences of length N D1: ACAATACAGGGCAGATAGATACAGATAGACACAGACAGAGCAGAGACAG D2: ACAATACAGGACAGTTAGATACAGATAGACACAGACAGAGCAGAGACAG Number of differences p = N
Kimura model p – probability of two different purines or pyrimidines q – probability of purine and pyrimidine