1 / 24

Genome evolution

Genome evolution. Lecture 9: Mutations and variational inference. Sources of mutations. Mistakes Replication errors (point mutations, tandem dups/deletions) Recombination errors (mainly indels) Endogenous DNA Damage Spontaneous base damage: Deaminations, depurinations

wmckinley
Download Presentation

Genome evolution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genome evolution Lecture 9: Mutations and variational inference

  2. Sources of mutations • Mistakes • Replication errors (point mutations, tandem dups/deletions) • Recombination errors (mainly indels) • Endogenous DNA Damage • Spontaneous base damage: Deaminations, depurinations • Byproducts of metabolism: Oxygen radicals that damage DNA • Exogenous DNA Damage • UV • Chemicals All of these mechanisms cross talk with the surrounding sequence

  3. DNA polymerases • replicating DNA • A good polymerase domain has a misincorporation rate of 10-5 (1/100,000) • Any misincorps are clipped off with 99% efficiency by the “proofreading” activity of the polymerase • Further mismatch repair that works in 99.9% of the case bring the fidelity of the main Polymerases to 10-10 • Some dedicated polymerases are not as accurate!

  4. Replication slippage Recombination errors • Processing a strand, disconnect and reconnect at the wrong place CACACACACACACACACA CGACAGCGACAGTTACAAA • A consequence of partial homology between different chromosomal loci • Can introduce translocations if the matching sequences are on different chromosomes • Can introduce inversion or deletion if the matching sequences are on the same chromosome • Can generate duplication or deletions if the matching sequences are in tandem

  5. O NH 2 H H* H N N deNHn H N H N O O H H Uracil Cytosine Endogenous DNA damage: Deamination of Cytosines *Thymine has CH3 here

  6. Deamination of Cytosine creates a G-U mismatch Easy to tell that U is wrong Deamination of Cytosine creates a G-T mismatch Not easy to tell which base is the mutation. About 50% of the time the G is “corrected” to A resulting in a mutation

  7. Exogenous DNA damage UV irradiation generate primarily Thymine dimers: • Chemicals - • Food • Benzopyrene – smoke • UV radiations (Sunlight) • Ionizing raidation • radon • Cosmic rays • X rays

  8. Repairing DNA damage Direct repair

  9. Thymine Dimers can be corrected by a direct repair mechanism Photon

  10. BER Deaminated bases are repaired by a base excision mechanism.

  11. BER Spontaneously occuring abasic sites are repaired by the same mechanism

  12. NER Dimeric bases and bulky lesions, e.g., large chemical adducts are repaired by Nucleotide excision repair

  13. Evolutionary consequences of the rich mutational process • Cannot ignore dependencies among adjacent sites • Mechanisms are evolutionary variable • Lifestyle -> Environmental exposure • Germline and male/female ratio • Mechanisms are variable on the genomic scale – late vs. early replication

  14. Conditional probabilities Conditional probabilities Conditional probabilities Conditional probabilities Dynamic Bayesian Networks 1 3 2 4 T=1 T=2 T=3 T=5 T=4 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 Synchronous discrete time process

  15. Context dependent Markov Processes 1 2 3 4 Context determines A markov process rate matrix Any dependency structure make sense, including loops A A A C A A G A A When context is changing, computing probabilities is difficult. Think of the hidden variables as the trajectories A A A C Continuous time Bayesian Networks Koller-Noodleman 2002

  16. Modeling simple context in the tree: PhyloHMM hpaij Heuristically approximating the Markov process? Where exactly it fails? hij-1 hij hpaij-1 hpaij hpaij+! hkj-1 hkj hkj+1 hij-1 hij hij+! Siepel-Haussler 2003

  17. The free energy is exactly the likelihood when q is the posterior: Log-likelihood to Free Energy • We have so far worked on computing the likelihood: • Computing likelihood is hard. We can reformulate the problem by adding parameters and transforming it into an optimization problem. Given a trial function q, define the free energy of the model as: • Better: when q a distribution, the free energy bounds the likelihood: D(q || p(h|s)) Likelihood

  18. Energy?? What energy? • In statistical mechanics, a system at temperature T with states x and an energy function E(x) is characterized by Boltzman’s law: • Z is the partition function: • Given a model p(h,s|T) (a BN), we can define the energy using Boltzman’s law • If we think of P(h|s,q):

  19. Free Energy and Variational Free Energy • The Helmoholtz free energy is defined in physics as: • This free energy is important in statistical mechanics, but it is difficult to compute, as our probabilistic Z (= p(s)) • The variational transformation introduce trial functions q(h), and set the variational free energy (or Gibbs free energy) to: • The average energy is: • The variational entropy is: • And as before:

  20. Maxmizing U? Maxmizing H? Solving the variational optimization problem Focus on max configurations Spread out the distribution • So instead of computing p(s), we can search for q that optimizes the free energy • This is still hard as before, but we can simplify the problem by restricting q • (this is where the additional degrees of freedom become important)

  21. Maxmizing U? Maxmizing H? Simplest variational approximation: Mean Field Focus on max configurations Spread out the distribution • Let’s assume complete independence among r.v.’s posteriors: • Under this assumption we can try optimizing the qi – (looking for minimal energy!)

  22. Mean Field Inference • We optimize iteratively: • Select i (sequentially, or using any method) • Optimize qi to minimize FMF(q1,..,qi,…,qn) while fixing all other qs • Terminate when FMF cannot be improved further • Remember: FMF always bound the likelihood • qi optimization can usually be done efficiently

  23. Adaptive mutations: Cairns et al. 88 Experimental system: lacz frameshift The experiment suggests adaptive mutations Luria-Delbruk’s observation

  24. The “Mutator” paradigm: Ability to switch to the mutator phenotype depends on particular DNA repair mechanisms (Double Strand Break repair in E. Coli) Mutator phenotype is suggested to be important in pathogenesis, antibiotic resistance, and in cancer Species occasionally change (adaptively or even by drift) their repair policy/efficiency The resulted substitution landscape must be very complex

More Related