240 likes | 251 Views
Explore the sources of mutations in genome evolution, including replication errors, recombination errors, endogenous and exogenous DNA damage. Learn about the mechanisms involved such as DNA polymerases, replication slippage, and repair processes. Understand the evolutionary consequences and variability of mutational processes. Dive into dynamic Bayesian networks and continuous time Bayesian networks for modeling genetic processes. Discover the complexities of computing probabilities in context-dependent Markov processes. Delve into variational inference and free energy concepts in statistical mechanics for optimizing the modeling of genetic data.
E N D
Genome evolution Lecture 9: Mutations and variational inference
Sources of mutations • Mistakes • Replication errors (point mutations, tandem dups/deletions) • Recombination errors (mainly indels) • Endogenous DNA Damage • Spontaneous base damage: Deaminations, depurinations • Byproducts of metabolism: Oxygen radicals that damage DNA • Exogenous DNA Damage • UV • Chemicals All of these mechanisms cross talk with the surrounding sequence
DNA polymerases • replicating DNA • A good polymerase domain has a misincorporation rate of 10-5 (1/100,000) • Any misincorps are clipped off with 99% efficiency by the “proofreading” activity of the polymerase • Further mismatch repair that works in 99.9% of the case bring the fidelity of the main Polymerases to 10-10 • Some dedicated polymerases are not as accurate!
Replication slippage Recombination errors • Processing a strand, disconnect and reconnect at the wrong place CACACACACACACACACA CGACAGCGACAGTTACAAA • A consequence of partial homology between different chromosomal loci • Can introduce translocations if the matching sequences are on different chromosomes • Can introduce inversion or deletion if the matching sequences are on the same chromosome • Can generate duplication or deletions if the matching sequences are in tandem
O NH 2 H H* H N N deNHn H N H N O O H H Uracil Cytosine Endogenous DNA damage: Deamination of Cytosines *Thymine has CH3 here
Deamination of Cytosine creates a G-U mismatch Easy to tell that U is wrong Deamination of Cytosine creates a G-T mismatch Not easy to tell which base is the mutation. About 50% of the time the G is “corrected” to A resulting in a mutation
Exogenous DNA damage UV irradiation generate primarily Thymine dimers: • Chemicals - • Food • Benzopyrene – smoke • UV radiations (Sunlight) • Ionizing raidation • radon • Cosmic rays • X rays
Repairing DNA damage Direct repair
Thymine Dimers can be corrected by a direct repair mechanism Photon
BER Deaminated bases are repaired by a base excision mechanism.
BER Spontaneously occuring abasic sites are repaired by the same mechanism
NER Dimeric bases and bulky lesions, e.g., large chemical adducts are repaired by Nucleotide excision repair
Evolutionary consequences of the rich mutational process • Cannot ignore dependencies among adjacent sites • Mechanisms are evolutionary variable • Lifestyle -> Environmental exposure • Germline and male/female ratio • Mechanisms are variable on the genomic scale – late vs. early replication
Conditional probabilities Conditional probabilities Conditional probabilities Conditional probabilities Dynamic Bayesian Networks 1 3 2 4 T=1 T=2 T=3 T=5 T=4 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 Synchronous discrete time process
Context dependent Markov Processes 1 2 3 4 Context determines A markov process rate matrix Any dependency structure make sense, including loops A A A C A A G A A When context is changing, computing probabilities is difficult. Think of the hidden variables as the trajectories A A A C Continuous time Bayesian Networks Koller-Noodleman 2002
Modeling simple context in the tree: PhyloHMM hpaij Heuristically approximating the Markov process? Where exactly it fails? hij-1 hij hpaij-1 hpaij hpaij+! hkj-1 hkj hkj+1 hij-1 hij hij+! Siepel-Haussler 2003
The free energy is exactly the likelihood when q is the posterior: Log-likelihood to Free Energy • We have so far worked on computing the likelihood: • Computing likelihood is hard. We can reformulate the problem by adding parameters and transforming it into an optimization problem. Given a trial function q, define the free energy of the model as: • Better: when q a distribution, the free energy bounds the likelihood: D(q || p(h|s)) Likelihood
Energy?? What energy? • In statistical mechanics, a system at temperature T with states x and an energy function E(x) is characterized by Boltzman’s law: • Z is the partition function: • Given a model p(h,s|T) (a BN), we can define the energy using Boltzman’s law • If we think of P(h|s,q):
Free Energy and Variational Free Energy • The Helmoholtz free energy is defined in physics as: • This free energy is important in statistical mechanics, but it is difficult to compute, as our probabilistic Z (= p(s)) • The variational transformation introduce trial functions q(h), and set the variational free energy (or Gibbs free energy) to: • The average energy is: • The variational entropy is: • And as before:
Maxmizing U? Maxmizing H? Solving the variational optimization problem Focus on max configurations Spread out the distribution • So instead of computing p(s), we can search for q that optimizes the free energy • This is still hard as before, but we can simplify the problem by restricting q • (this is where the additional degrees of freedom become important)
Maxmizing U? Maxmizing H? Simplest variational approximation: Mean Field Focus on max configurations Spread out the distribution • Let’s assume complete independence among r.v.’s posteriors: • Under this assumption we can try optimizing the qi – (looking for minimal energy!)
Mean Field Inference • We optimize iteratively: • Select i (sequentially, or using any method) • Optimize qi to minimize FMF(q1,..,qi,…,qn) while fixing all other qs • Terminate when FMF cannot be improved further • Remember: FMF always bound the likelihood • qi optimization can usually be done efficiently
Adaptive mutations: Cairns et al. 88 Experimental system: lacz frameshift The experiment suggests adaptive mutations Luria-Delbruk’s observation
The “Mutator” paradigm: Ability to switch to the mutator phenotype depends on particular DNA repair mechanisms (Double Strand Break repair in E. Coli) Mutator phenotype is suggested to be important in pathogenesis, antibiotic resistance, and in cancer Species occasionally change (adaptively or even by drift) their repair policy/efficiency The resulted substitution landscape must be very complex