1 / 27

Entropy Driven Evolution: Why DNA is Coded in 4 Bases and Reproduction Takes 2 Sexes?

This article explores the idea that evolution seeks to maximize biodiversity within the constraints of time and energy. It applies this concept to various biological systems, such as DNA replication, protein synthesis, sexual reproduction, and speciation. The mathematical theory of communication by Claude Shannon is used to understand the transmission of information in these systems. The article also discusses the importance of equaprobability in maximizing the transmission rate and applies this concept to DNA replication.

philmeade
Download Presentation

Entropy Driven Evolution: Why DNA is Coded in 4 Bases and Reproduction Takes 2 Sexes?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Entropy Driven Evolution: Why DNA is Coded in 4 Bases and Reproduction Takes 2 Sexes? Bo Deng Department of Mathematics UNL IIT, 14 Feb. 2011 http://www.math.unl.edu/~bdeng1

  2. Working Hypothesis Evolution is driven to maximize biodiversity against constraints in time and energy across all biological scales • Applied to all informational systems: • DNA Replication • Protein Synthesis • Sexual Reproduction • Speciation to Phylogenetic Tree • Ecological Community • Animal Brain • Consciousness • Language • Social, Economical, Political Structures

  3. C. E. Shannon, ``A mathematical theory of communication,'' Bell System Technical Journal, vol. 27, pp. 379-423 and 623-656, July and October, 1948. Claude E. Shannon (1916-2001) Channel

  4. Internet What is Information? and What Matters the Most? All about choices Transmission Speed Comparison

  5. # of sequences of length log2 n =# of choicesn Bit Unit: 0 or 1 …… Mathematical Measure of Information: What is in a bit? One Bit = One Binary Digit Dead Channel --- Transmit only one kind of symbol all the times e.g. 0000…..  0 bit  0 bit information Live Channel --- Transmit one of many possible symbols each time, e.g. 011101… in a binary channel  Each transmitted symbol is either 0 or 1  Each symbol contains 1 bit information Pop Quiz:How many bits in a quaternary symbol, 1, 2, 3, 4? or in a symbol of n alphabets, 1, 2, 3, …, n? Answer:H4 = 2 bits, and Hn = log2 n bits respectively because 4 = 2 log24, n = 2 log2n Key Assumption: Each transmitted symbol is just one of nequally probable choices Ex: { a, b, c, d } = { 00, 01, 10, 11}

  6. What is in the transmission rate? • Lettkbe time needed to transmit symbol k • Then the average transmission time per base is • Tn = (t1 + t2 + t3 +…+ tn) / n • And the mean rate is • Rn= Hn / Tn = n log2 n /(t1 + t2 + t3 +…+ tn) • The definition implicitly assumes that all symbols occur • equally probable. • Why, or is it reasonable?

  7. 1/p1=#of sequences of length log21/p1 Bit Unit: 0 or 1 …… • Example: Pick a marble from • a bag of 2 blue, and • 5 read marbles • Probability for picking • a blue marble: • pblue = 2/7 • Number of choices for each blue picked 1 / pblue = 7/2 =3.5 Recall: Rn= Hn / Tn = n log2n / (t1 + t2 + t3 +…+ tn) All-purpose Channel • Each transmitted Symbol 1 is just one choice out of 1/p1 • many possible choices and therefore Symbol 1 contains • log2 1/p1bits information • since 1/p1= 2 log21/p1 • Similarly, Symbol k contains log2 1/pkbits information •  The average bits per symbol for our video only source is • H(p) =p1log2 1/p1+…+ pnlog2 1/pn • Internet message types: video, audio, pictures, spams, …etc • Each has different frequency distribution in the encoding symbols Important fact: H(p) =p1log2 1/p1+…+ pnlog2 1/pn<= Hn = log2 n Equiprobability Conclusion: For an all-purpose channel, the mean rate is calculated not for any particular source entropy but for the maximal source entropy, Hn , which is reached with equaprobability distribution of the transmitting symbols. • Example of Possible Non-equiprobability: • If we know all video files that have ever transmitted • over the internet, then we can make an accurate • frequency table: say p1 for Symbol 1, p2 for 2, etc, and • pn for symbol n

  8. .... • Encoding states: • Symbols: 1 2 3 …. n • Trans. Times: t1 t2 t3 … tn • Assume: • t1 = 1 sec,t2 = 2 sec, t3 = 3 sec, … , tn= n sec Then Rn= Hn / Tn = n log2n /(t1 + t2 + t3 +…+ tn) = 2log2 n /(n+1) Design Criterion To choose n so that Rn= Hn / Tn is the largest! Example

  9. DNA Replication James D. Watson (1928 -), Francis Crick (1916 - 2004), Molecular structure of nucleic acids, Nature, 171(1953), pp.737--738. http://www.mun.ca/biology/scarr/An11_01_DNA_replication.mov Deoxyribonucleic Acid A (adenine), T (Thymine),C (cytosine), G (guanine)?

  10. Communication Model for DNA Replication • Fact: • DNA replication is the same for all genomes • Replication is a sequential process – one base a time • Observation: • Each species genome is an information source • Genome upon replication is a transmitted message Conceptual Model: DNA replication is an all-purpose channel Questions: Why 4 bases: A, T , C , G?

  11. Replication Mean Rate: Rn= Hn / Tn, (per-base diversity rate) • Assumption: • Weaker chemical bonds take longer to replicate (Heisenberg’s Uncertainty Principle: t E ~ constant ) • Paring times of high energy bonds • are ignored (as a first attempt/order approximation • for the pairing time) • tA = tT = pairing time of one H…O bond = t0 • tG = tC = pairing time of two H…O bond = 2 t0 • t5 = t6 = pairing time of three H…O bond = 3 t0, etc. • (by Watson and Crick’s base paring principle) Time scale of a single Hydrogen bond pairing: 4X10-15 sec.

  12. The Result Let k = # of base pairs, and n = # of bases Then n = 2 k Since t2m-1 =t2m= mt0form = 1,2, …, k Rn= Hn / Tn = log2 n / [2(t1 + t3 + …+ t2k-1) /n] = log2 n /[(n/2+1) t0/2]

  13. 1.8267 A further refined model predicts 1.65 <tC,G/tA,T< 3  R4 = the optimal rate

  14. 2 Sexes Problem Sexual Reproduction is a process of information exchange

  15. Reproduction Mean Ratio: Sn= Hn / En, • Assumption: • Information payoff per-crossover base for n sexes: • Hn = log2 n • 1:1 sex ratio with M members for each sex • Cost to sexual reproduction in energy and time is • inversely proportional to the probability of having • a reproductive group of n members having exactly • one sex each • Reproductive group is formed by random encounter

  16. Reproductive Probability: Reproductive Group in k Tries: Expected Tries for One Reproductive Group : Expected Tries for One Reproductive Group for Large Population :

  17. The Result: Entropy-to-Cost Ratio: Sn= Hn / En, M = 10m

  18. Genetic Entropy Exchange without Sexual but Existential Cost :

  19. Multiparous Strategy Multiparous Entropy: Multiparous Cost : Multiparous Entropy to Cost Ratio : With Mixed (Random & Wedlock) Cost :

  20. Rn / R4 a = 2 n = 4 Slower by Evolutionary Set-back by n = 2 < 0.75 > 25% > 1 billion yrs n = 6 < 0.98 > 2% > 80 million yrs Discussions Evolutionary Clock Set-back with 3 Sexes: • Life on Earth could have not evolved faster and have had a richer diversity at the same time • Consistent with Darwinian Theory of Survival-of- the-Fittest theory but at the molecular level Question: Was the origin of life driven by informational selection?

  21. The Role of Mathematics • Why is the per-base diversity measure by Hn = log2 n or H( p) = Spk log2 1/pk log2 1/(p1 p2) = log2 1/p1 + log2 1/p2  Information is additive • Mathematics is driven by open problems • Science is driven by existing solutions • Mathematical modeling is to discover the mathematics • to which Nature fits as a solution • Exception to the rule is the rule in biology

  22. Acknowledgements • Dr. Reg Garrett,Department of Biology, University of Virginia, regarding the GC transcription elongation problem • Dr. David Ussery,Center for Biological Sequence Analysis, Technical University of Denmark, on most base frequency data • Dr. Daniel Smith,Department of Biology, Oregon State University, regarding the base frequencies of P. ubique • Dr. Tony Joern,Department of Biology, UNL, Kansas State University • Dr. Etsuko Moriyama,the Beadle Center for Genetics Research, University of Nebraska-Lincoln • Dr. Hideaki Moriyama,Dr. Xiao-Cheng Zhen, Department of Chemistry, University of Nebraska-Lincoln • Irakli Loladze, David Logan, Department of Mathematics, UNL

  23. The show of life is on your DNA channel We are consumers of reproductive entropy

  24. * Base frequency for the chromosome 14 which has the largest d.

  25. Viruses are taking advantage of the replication system by having the near maximal per-base diversity entropy and having their hosts do the replication for them. To Maximize Stationary Entropy: H(p) =p1log2 1/p1+…+ pnlog2 1/pn

  26. 1.8267 1.8267 * Base frequency for the chromosome 14 which has the largest d.

  27. Others have to scramble with individual and absolute Channel Capacities, i.e., Objective:Max.R(p) = H (p)/ T (p) Subject to:p1+ p2+ …+ pn = 1, pk > 0 • Optimization Result: • pA=pT, pG=pC • pG=pAa, a = tG,C /tA,T • K = max R(p) = (log2 1/pA) /tA,T

More Related