350 likes | 450 Views
Finding Mathematics in Genes and Diseases. Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP). Outline:. DNA and RNA Genome, genes, and diseases Palindromes and replication origins in viral genomes Mathematics for prediction of replication origins.
E N D
Finding Mathematics in Genes and Diseases Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso (UTEP)
Outline: • DNA and RNA • Genome, genes, and diseases • Palindromes and replication origins in viral genomes • Mathematics for prediction of replication origins Cytomegalovirus (CMV) Particle
A A C C G G A C T U T G DNA and RNA • DNA is deoxyribonucleic acid, made up of 4 nucleotide bases Adenine, Cytosine, Guanine, and Thymine. • RNA is ribonucleic acid, made up of 4 nucleotide bases Adenine, Cytosine, Guanine, and Uracil. • For uniformity of notation, all DNA and RNA data sequences deposited in GenBank are represented as sequences of A, C, G, and T. • The bases A and T form a complementary pair, so are C and G.
Virus and Eye Diseases CMV Particle CMV Retinitis • inflammation of the retina • triggered by CMV particles • may lead to blindness Genome size ~ 230 kbp
Replication Origins and Palindromes • High concentration of palindromes exists around replication origins of other herpesviruses • Locating clusters of palindromes (above a minimal length) on CMV genome sequence might reveal likely locations of its replication origins.
remove spaces and capitalize Palindromes in Letter Sequences Odd Palindrome: “A nut for a jar of tuna” ANUTFORA J AROFTUNA Even Palindrome: “Step on no pets” STEPON NOPETS
Computational Prediction of Replication Origins • Palindrome distribution in a random sequence model • Criterion for identifying statistically significant palindrome clusters • Evaluate prediction accuracy • Try to improve…
A C G T Random Sequence Model • A mathematical model can be used to generate a DNA sequence • A DNA molecule is made up of 4 types of bases • It can be represented by a letter sequence with alphabet size = 4 • Adenosine • Cytosine • Guanine • Thymine Wheel of Bases (WOB)
A C G T Random Sequence Model Each type of the bases has its chance (or probability) of being used, depending on the base composition of the DNA molecule. • Adenosine • Cytosine • Guanine • Thymine Wheel of Bases (WOB)
A 1_3 C 1_3 1_6 1_6 G T Random Sequence Model Each type of the bases has its chance (or probability) of being used, depending on the base composition of the DNA molecule. • Adenosine • Cytosine • Guanine • Thymine Wheel of Bases (WOB)
Use of the Scan Statistic to Identify Clusters of Palindromes
Measures of Prediction Accuracy Attempts to improve prediction accuracy by: • Adopting the best possible approximation to the scan statistic distribution • Taking the lengths of palindromes into consideration when counting palindromes • Using a better random sequence model
Markov Chain Sequence Models • More realistic random sequence model for DNA and RNA • It allows neighbor dependence of bases (i.e., the present base will affect the selection of bases for the next base) • A Markov chain of nucleotide bases can be generated using four WOBs in a “Sequence Generator (SG)”
Bases A C G T Sequence Generator (SG) Wheels of Bases (WOB)
Bases A C G T Sequence Generator (SG) Wheels of Bases (WOB)
Bases A C G T T Sequence Generator (SG) Wheels of Bases (WOB)
Bases A C G T T Sequence Generator (SG) Wheels of Bases (WOB)
Bases A C G T C T Sequence Generator (SG) Wheels of Bases (WOB)
Bases A C G T C T Sequence Generator (SG) Wheels of Bases (WOB)
Bases A C G T C T T Sequence Generator (SG) Wheels of Bases (WOB)
Bases A C G T C T T T Sequence Generator (SG) Wheels of Bases (WOB)
Bases A C G T C T T T T Sequence Generator (SG) Wheels of Bases (WOB)
Bases A C G T C T T T T A Sequence Generator (SG) Wheels of Bases (WOB)
Bases A C G T C T T T T A A Sequence Generator (SG) Wheels of Bases (WOB)
Bases A C G G G T C C C T T T T T T A A A A Sequence Generator (SG) Wheels of Bases (WOB)
Bases A C G G G T C C C T T T T T T A A A A Sequence Generator (SG) Wheels of Bases (WOB)
Results Obtained for Markov Sequence Models • Probabilities of occurrences of single palindromes • Probabilities of occurrences of overlapping palindromes • Mean and variance of palindrome counts
Related Work in Progress • Finding the palindrome distribution on Markov random sequences • Investigating other sequence patterns such as close repeats and inversions in relation to replication origins
Other Mathematical Topics in Genes and Diseases • Optimization Techniques – prediction of molecular structures • Differential Equations – molecular dynamics • Matrix Theory – analyzing gene expression data • Fourier Analysis – proteomics data