300 likes | 660 Views
Phylogenetic Analysis. Greek: phylon – race genetic -- birth. Phylogenetic Analysis. The evolutionary relationship among a set of species is called a Phylogeny, represented by a phylogenetic tree . Infer phylogenetic tree (Reconstruction) from observation of the existing organisms
E N D
Phylogenetic Analysis Greek: phylon – race genetic -- birth
Phylogenetic Analysis • The evolutionary relationship among a set of species is called a Phylogeny, represented by a phylogenetic tree. • Infer phylogenetic tree (Reconstruction) from observation of the existing organisms • Then: morphological characters • Now: molecular sequences! Zuckerkandl & Pauling [1962]
Relationship to MSA • Multiple alignment of sequences should take account of their evolutionary relationship. (Some multiple alignment algorithms do use a “guide tree”) • Alignment and tree-building can proceed simultaneously
Gene Duplication 2B 1B 3A 3B 2A Phylogeny of … • … Orthologues: divergence from a common ancestor, speciation -- in different species • … Paralogues: divergence from gene duplication -- within same species/organism Speciation 1A
Elements of a Tree • Leaves/Nodes: sequences • Taxa (singular: taxon): outer leaves • Edges: edge lengths correspond to evolutionary time periods • Roots:
Molecular Clock Theory I(Zukerkandl & Pauling, early 1960’s) • For any given protein, accepted mutations in the amino acid sequence for the protein occur at constant rate • Implication • # of accepted mutations proportional to length of time interval • All proteins/species observed today have the same “molecular age” • Works well for closely related species
Molecular Clock Theory II • Rate of accepted mutations maybe different for different proteins (depending on their tolerance for mutations) • Different parts of a protein may evolve at different rates Counting mutations 2 3 2 3 4 1 4 1
Distance-based Methods We assume that the “distance” between each pair of sequences is proportional to the evolutionary time between them.
How to Collect Distance Data • Lab methods: mix single strands of DNA from different species, measure how tightly they associate. • Sequence analysis methods: estimate number of mutations based on sequence comparisons
Ultrametric Distance Matrices • D is an ultrametricdistance matrix, if and only if • for every three indices i, j and k there is a tie for the maximum of D(i,j), D(i,k) and D(j,k). That is, the maximum is not unique.
Test if the data is ultrametric Mol. Clock Theory I is valid for this group of seq.s
Ultrametric not-ultrametric 2 3 2 3 4 1 4 1 Constant mutation rate
When MCT. 1 fails The distance matrix is no longer ultrametric
When distance is additive Inferring an inner node k j m i
Neighbor Joining • Can we use this fact to construct trees? • Infer inner nodes • Gradually strip off leaves (outer nodes)
Finding Neighboring Leaves • Let where Theorem: if D(i,j) is minimal (among all pairs of leaves), then i and j are neighbors in the tree g j i h
Neighbor Joining • Set L to contain all leaves Iteration: • Choose i,j such that D(i,j) is minimal (neighbors) • Create new node k, and set • remove i,j from L, and add k Terminate:when |L| =2, connect two remaining nodes