1 / 35

Finding Mathematics in Genes and Diseases

Finding Mathematics in Genes and Diseases. Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP). Outline:. DNA and RNA Genome, genes, and diseases Palindromes and replication origins in viral genomes Mathematics for prediction of replication origins.

dionne
Download Presentation

Finding Mathematics in Genes and Diseases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)

  2. Outline: • DNA and RNA • Genome, genes, and diseases • Palindromes and replication origins in viral genomes • Mathematics for prediction of replication origins Cytomegalovirus (CMV) Particle

  3. A A C C G G A C T U T G DNA and RNA • DNA is deoxyribonucleic acid, made up of 4 nucleotide bases Adenine, Cytosine, Guanine, and Thymine. • RNA is ribonucleic acid, made up of 4 nucleotide bases Adenine, Cytosine, Guanine, and Uracil. • For uniformity of notation, all DNA and RNA data sequences deposited in GenBank are represented as sequences of A, C, G, and T. • The bases A and T form a complementary pair, so are C and G.

  4. Genes and Genome

  5. Genes and Diseases

  6. Virus and Eye Diseases CMV Particle CMV Retinitis • inflammation of the retina • triggered by CMV particles • may lead to blindness Genome size ~ 230 kbp

  7. Replication Origins and Palindromes • High concentration of palindromes exists around replication origins of other herpesviruses • Locating clusters of palindromes (above a minimal length) on CMV genome sequence might reveal likely locations of its replication origins.

  8. remove spaces and capitalize Palindromes in Letter Sequences Odd Palindrome: “A nut for a jar of tuna” ANUTFORA J AROFTUNA Even Palindrome: “Step on no pets” STEPON NOPETS

  9. DNA Palindromes

  10. Association of Palindrome Clusters with Replication Origins

  11. Computational Prediction of Replication Origins • Palindrome distribution in a random sequence model • Criterion for identifying statistically significant palindrome clusters • Evaluate prediction accuracy • Try to improve…

  12. A C G T Random Sequence Model • A mathematical model can be used to generate a DNA sequence • A DNA molecule is made up of 4 types of bases • It can be represented by a letter sequence with alphabet size = 4 • Adenosine • Cytosine • Guanine • Thymine Wheel of Bases (WOB)

  13. A C G T Random Sequence Model Each type of the bases has its chance (or probability) of being used, depending on the base composition of the DNA molecule. • Adenosine • Cytosine • Guanine • Thymine Wheel of Bases (WOB)

  14. A 1_3 C 1_3 1_6 1_6 G T Random Sequence Model Each type of the bases has its chance (or probability) of being used, depending on the base composition of the DNA molecule. • Adenosine • Cytosine • Guanine • Thymine Wheel of Bases (WOB)

  15. Poisson Process Approximation of Palindrome Distribution

  16. Use of the Scan Statistic to Identify Clusters of Palindromes

  17. Measures of Prediction Accuracy Attempts to improve prediction accuracy by: • Adopting the best possible approximation to the scan statistic distribution • Taking the lengths of palindromes into consideration when counting palindromes • Using a better random sequence model

  18. Markov Chain Sequence Models • More realistic random sequence model for DNA and RNA • It allows neighbor dependence of bases (i.e., the present base will affect the selection of bases for the next base) • A Markov chain of nucleotide bases can be generated using four WOBs in a “Sequence Generator (SG)”

  19. Bases A C G T Sequence Generator (SG) Wheels of Bases (WOB)

  20. Bases A C G T Sequence Generator (SG) Wheels of Bases (WOB)

  21. Bases A C G T T Sequence Generator (SG) Wheels of Bases (WOB)

  22. Bases A C G T T Sequence Generator (SG) Wheels of Bases (WOB)

  23. Bases A C G T C T Sequence Generator (SG) Wheels of Bases (WOB)

  24. Bases A C G T C T Sequence Generator (SG) Wheels of Bases (WOB)

  25. Bases A C G T C T T Sequence Generator (SG) Wheels of Bases (WOB)

  26. Bases A C G T C T T T Sequence Generator (SG) Wheels of Bases (WOB)

  27. Bases A C G T C T T T T Sequence Generator (SG) Wheels of Bases (WOB)

  28. Bases A C G T C T T T T A Sequence Generator (SG) Wheels of Bases (WOB)

  29. Bases A C G T C T T T T A A Sequence Generator (SG) Wheels of Bases (WOB)

  30. Bases A C G G G T C C C T T T T T T A A A A Sequence Generator (SG) Wheels of Bases (WOB)

  31. Bases A C G G G T C C C T T T T T T A A A A Sequence Generator (SG) Wheels of Bases (WOB)

  32. Results Obtained for Markov Sequence Models • Probabilities of occurrences of single palindromes • Probabilities of occurrences of overlapping palindromes • Mean and variance of palindrome counts

  33. Related Work in Progress • Finding the palindrome distribution on Markov random sequences • Investigating other sequence patterns such as close repeats and inversions in relation to replication origins

  34. Other Mathematical Topics in Genes and Diseases • Optimization Techniques – prediction of molecular structures • Differential Equations – molecular dynamics • Matrix Theory – analyzing gene expression data • Fourier Analysis – proteomics data

More Related