Models of Evolution

Models of Evolution Majid Kazemian

Introduction • Probabilistic Model of Indels • Model of an arbitrary distribution of indel lengths (TKF Model) • MCALIGN • We have seen above models in the course • Models of Nucleotide Substitution • Jukes Cantor model • Kimura model

Phylogeny Tree • Given a set of sequences x0=(x1,x2,…,xn) the goal is to infer Phylogeny tree • Suppose that • n= # of species • T=Topology of the tree • t0 is the edges’ length in the tree • We want to compute pr(x0|T,t0)

t5 x5 t4 x4 t3 t2 t1 x2 x3 x1 A simple example • Suppose we have the following phylogeny tree then: • So to calculate pr(x0|T,t0) we need Pr( x | y, t), the probability that y evolves to x in time t

Substitution • Assume that • Indels do not occur • Each position of sequence evolves independently • Then • Where pr(xj | yj, t) is the probability of a change from “yj” to “xj” in time t Ancestor : y1y2…yL Descendant : x1x2…xL

Substitution Matrix

aj xj yj t1 t2 time t1+t2 t0 t1 The assumption of the model • Multiplicity requirement • S(t1)S(t2)=S(t1+t2) • This requirement will hold if the transition probabilities be stationary and Markovian • Intuitively means that the probability of going from yj to xj just depends on (t2+t1) – t1

Jukes Cantor Model

Jukes Cantor Model (cont.) • In small amount of time ε probability of substitution is linear to time. This means that we can not go from Ai to Aj and go back to Ai. • S(ε)≈ I + Rε

Jukes Cantor Model (cont.) • Is S(t) similar to S(ε)?

Jukes Cantor Model (cont.) • We know that S(t) has the following form (why ?)

Jukes Cantor Model (cont.)

More advanced models • The J-C model made highly “symmetric” assumptions, in its formulation of the rate matrix R • In reality, for example, “transitions” are more common than “transversions” • What are these? Purine = A or G. Pyrimidine = C or T. Transition is substitution in the same category; transversion is substitution across categories • Purines are similarly sized, and pyrimidines are similarly sized. More likely to be replaced by similar sized nucl. • The “Kimura” model captures this transition/transversion bias

Kimura Model • The rate matrix R is given by:

Kimura Model (cont.) • We know that S(t) should look like this (why ?)

Kimura Model (cont.) • Again by solving differential equations (like what we did for JC model) we have

Even More advanced models (cont.) • Get to greater levels of realism • Kimura model still has a uniform stationary distribution, which is not true of real data • One extension: purine to pyrimidine subst. prob. is different from pyrimidine to purine subst. prob. • This leads to a non-uniform stationary probability • The “HKY” model captures this bias

t2 t1 x2 x1 Inferring Phylogeny for two sequences • Let’s back to the original problem, we wanted to compute pr(x0|T,t0) • In the case of two sequences without gap we have Probability of root

A simple example • Suppose that • x1=C C G G C C G C G C G • x2=C G G G C C G G C C G

A simple example (cont.) • Assume JC model • Our goal is to find the tree topology, t1 and t2

A simple example (cont.) • Suppose that n1 is the number of CC and GG pairs and n2 is #CG + #GC pairs • So • If α is known then we can find t1+t2 by simple Maximum Likelihood • α is estimated based on two close species that we assume t1+t2=1

Parent of node i All possible internal node assignments Inferring Phylogeny for n sequences • How to infer topology and t0 for n sequences • How to compute this probability efficiently?

Dynamic Programming • The recursion: probability of all leaves below node k given that residue at k is α • How to estimate (T,t0)? ML estimation? α b c

How to infer topology? • The naïve way is to enumerate all topologies and solve ML estimation for a topology with numerical approaches (like Newtonian method) • This is not good if we have many species • The idea of inferring topology is utilizing a sampling technique

Metropolis Sampling • We have • We must propose rejection and acceptance mechanism to go

Proposal distribution • Accept with following probability

Two comments • We made an independence assumption for column of genome, some region are evolving faster and some slower • We assumed that there is no gap • We need to consider gap (e.g pair HMM)

Reference • Probabilistic Models of Proteins and Nucleic Acids ( by Richard Durbin , Sean R. Eddy , Anders Krogh , Graeme Mitchison) • 8.1 - 8.2 - 8.3 - 8.4 - 8.5

Models of Evolution

Models of Evolution

Presentation Transcript

The drawbacks of Unilinear Cultural Evolution models

Models of Protein Evolution

Numerical models of landscape evolution

The evolution of models

An Evolution of Environmental Prevention Models

Models of Molecular Evolution I

Lecture 3: Markov models of sequence evolution

Models of DNA evolution

Types and Models of Evolution

Mixture models of nucleotide sequence evolution

CHEMICAL EVOLUTION MODELS

The Evolution of Capital Asset Pricing Models

EVOLUTION MODELS OF LOW METALLICITY STARS

Evolution of Climate Models

Testing Alternative Models of Social Evolution

Animal Models (why?) Evolution of Language

Molecular Models of Biological Evolution

Towards Realistic Models for Evolution of Cooperation

Models of Molecular Evolution II

Evolution of statistical models of non-conservative particle interactions