1 / 43

Models for DNA substitution

Models for DNA substitution. http://www.stat.rice.edu/ ~mathbio/Polanski/stat655 /. Plan. Basics Models in discrete time Model is continuous time Parameter estimation. Nucleotides. Adenine ( A ) or ( a ) Guanine ( G ) or ( g ) Cytosine ( C ) or ( c ) Thymine ( T ) or ( t ) . purines.

ellard
Download Presentation

Models for DNA substitution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Models for DNA substitution

  2. http://www.stat.rice.edu/~mathbio/Polanski/stat655/

  3. Plan • Basics • Models in discrete time • Model is continuous time • Parameter estimation

  4. Nucleotides • Adenine ( A ) or ( a ) • Guanine ( G ) or ( g ) • Cytosine ( C ) or ( c ) • Thymine ( T ) or ( t ) purines pyrimidines

  5. Substitution Purine Purine Transitions Pyrimidine Pyrimidine AG, G A, C T, T C Purine Pyrimidine Pyrimidine Purine Transversions AT, T A, A C, C A GT, T G, G C, C G

  6. Other Deletions, insertions Insertions in reverse order

  7. Hypothesis Substitution of nucleotides in the evolution of DNA sequences can be modeled by a Markov chain or Markov process

  8. Other assumptions • Stationarity • Reversibility

  9. Transition matrix g c t a pac pat pag paa a pgt pga pgg pgc g P = pca pcg pcc pct c t pta ptg ptc ptt

  10. Models – discrete time

  11. Jukes – Cantor model All substitutions are equally probable

  12. Stationary distribution

  13. Spectral decomposition of Pn

  14. Remark • When learning and researching Markov models for nucleotide substitution, it greatly helps to use a software for symbolic computation, like Mathematica, Maple, Scientific Workplace.

  15. Kimura models •  - probability of a transition •  - probability of a specific transversion

  16. Kimura 3ST model •  - probability of : AG, C T •  - probability of : AC, G T •  - probability of : AT, C G

  17. Stationary distribution

  18. Generalizations of Kimura models By Ewens:  - probability of : AG, C T  - probability of : AC, A  T, G C, G T  - probability of : CA, T  A, C G, T G

  19. Stationary distribution

  20. Spectral decomposition

  21. By Blaisdell:  - probability of : AG, CT  - probability of : GA, TC  - probability of : AC, A  T, G C, G T  - probability of : CA, T  A, C G, T G

  22. Stationary distribution where Remark: this model is not reversible

  23. Felsenstein model Probability of substitution of any nucleotide by another is proportional to the stationary probability of the substituting nucleotide

  24. Stationary distribution

  25. HKY model Hasegawa, Kishino, Yano Different rates for transitions and transversions

  26. Eigenvalues of P

  27. Left (row) eigenvectors

  28. Right (column) eigenvectors

  29. General 12 parameter model Tavare, 1986

  30. Stationary distribution

  31. Reversibility A=D, B=G, C=J, E=H, F=K, I=L Conclusion – the most general reversible model has 12 – 6 = 6 free parameters

  32. Continuous – time models

  33. Matrix of transition probabilites Q – intensity matrix

  34. Jukes – Cantor model

  35. Spectral decomposition of P(t)

  36. Kimura model

  37. Spectral decomposition of P(t)

  38. Parameter estimation

  39. Jukes – Cantor model Three things are equivalent due to reversibility: Ancestor (A) A D2 D1 D1 D2 A D1 D2

  40. Probability that the nucleotides are different in two descendants

  41. Estimating p We have two DNA sequences of length N D1: ACAATACAGGGCAGATAGATACAGATAGACACAGACAGAGCAGAGACAG D2: ACAATACAGGACAGTTAGATACAGATAGACACAGACAGAGCAGAGACAG Number of differences p = N

  42. Kimura model p – probability of two different purines or pyrimidines q – probability of purine and pyrimidine

More Related