270 likes | 551 Views
Genetic Distance. Definition. Definition: A measure of how different two sequences are The number of evolutionary events that have occurred since two sequences diverged. Simplest: p-distance = proportion of sites that are different
E N D
Definition Definition: A measure of how different two sequences are The number of evolutionary events that have occurred since two sequences diverged Simplest: p-distance = proportion of sites that are different (but p-distance is not a good metric – doesn’t add )
A T T G C G C T C C A Correction for Multiple Substitution A T T G C G C C A A T A T Differences Substitutions
Correcting for multiple substitutions Jukes and Cantor – all substitutions were created equal
Jukes and Cantor d = -3/4 ln (1 – 4/3p)
Jukes and Cantor - Graph Genetic Distance Proportion of sites that are different
Correcting for multiple substitutions Jukes and Cantor – all substitutions were created equal Kimura 2-parameter – distinguish between transitions and transversions Models of nucleotide substitution correct for 2 things: - biases in the rate at which different mutations occur - biases in the equilibrium frequencies of different nucleotides
Kimura 2P model: Allows for transition-transversion bias HKY model: Allows for nucleotide bias and transition - transversion bias General Reversible: Allows for nucleotide bias as well as different mutation rates for all nucleotide pairs
Note: Be careful not to confuse the transition/transversion rate ratio (k) with the transition/transversion ratio (R) In the case of the Kimura 2-P model
Additional corrections for different rates of evolution at different sequence positions can also be applied - gamma distribution (incorporated analytically) - by codon position
In order to perform a gamma correction for site specific rates you need to know the shape of the gamma distribution
You need several sequences to work out the shape of the gamma distribution You normally don’t apply a gamma correction on just a pair of sequences unless you know the shape (alpha) of the gamma distribution in advance
Some general points: - genetic distances can be far greater than 1 - smaller genetic distances are more reliable - model choice has a bigger impact for distantly related sequences - normally positions with gaps are ignored (complete deletion) - IF you know the rate of evolution for a pair of sequences (and if the rate has remained more or less constant) you can estimate the date at which they diverged