160 likes | 375 Views
Phylogenetic Trees: Assumptions. All existing species have a common ancestor Each species is descended from a single ancestor Each speciation gives rise to two derived species This leads to a ‘tree’ topology for geneology. Tree structure. ‘Leaves’ are existing species
E N D
Phylogenetic Trees: Assumptions • All existing species have a common ancestor • Each species is descended from a single ancestor • Each speciation gives rise to two derived species • This leads to a ‘tree’ topology for geneology
Tree structure • ‘Leaves’ are existing species • Interior nodes are hypothetical common ancestors • Tree may be rooted or unrooted • Root (if it exists) is universal ancesestor gorilla human bonobo chimp
Limits of these assumptions • Early life (probably wasn’t nicely ‘packaged’) • Merger of symbiotes (origin of eukaryotes, plants) • Lateral gene transfer
Characteristics • Relatedness is inferred from features, or characters • discrete character data • example: has feathers, number of fingers • data forms a n x m matrix • distance data • example: sequence edit distance • data forms an n x n triangular matrix
Ordered or unordered characters • Differences in characters are assumed to be the result of transitions from a previous uniform state • (unordered) Such transitions may occur between any two states • (ordered) They may occur only in a fixed sequence
Reversals • Reversal - mutation into the primitive (ancestral) state • Reversals are possible but unlikely • Examples • nucleic acid sequence • toes of a mammal
Perfect phylogeny • Perfect phylogeny • Each edge in tree = one state transition • That is, the entire subtree must share this state • Central problem of character state phylogeny: • input: set O of n objects. Set c of m characters. Set r of states. • Output (y/n): is there a perfect phylogeny for O?
Solve Perfect Phylogeny problem • Problem 2 • aagtt • atgtc • atgta • atgtt • aaatt • ttgta • Problem 3 • aagtt • ttgtt • tcctt • tcaat • tcagt • acagt • Problem 1 • acgac • accag • acgtt • tccag • acgat • aagtt
Complexity of character problem • Ordered states • the problem is polynomial • Unordered states • If the number of states (r) is unbounded, PPP is NP-complete. • If the number of states is small (2, 3, 4) an algorithm which runs in reasonable time is known
Relaxing our assumptions • Expecting perfect phylogeny is unrealistic, because • biological data contains errors • reversals to occur • Two approaches to relaxation • minimize reversals (parsimony) • throw away minimum of offending characters (compatibility criterion)
Ideal distance matrix problem • Input: a set O of objects and a (triangular) matrix M of pairwise distances between them • Output: a tree in which the nodes are O and the paths have weights M
Ideal distance matrix problem • Input: a set O of objects and a (triangular) matrix M of pairwise distances between them • Output: a tree in which the nodes are O and the paths have weights M • Nonideal considerations: reversals, convergence, error
Relaxed distance matrix problem • Input: a set O of objects and a pair of (triangular) matrix Mminand Mmax of pairwise distances between them • Output: a tree in which the nodes are O and the path weights are bounded by Mminand Mmax
Relaxed distance matrix problem • Input: a set O of objects and a pair of (triangular) matrix Mminand Mmax of pairwise distances between them • Output: a tree in which the nodes are O and the path weights are bounded by Mminand Mmax • Tractable only for ultrametric trees (all leaves equidistant from root)
Neighbor joining A simple and commonly used heuristic while (|nodes| > 2) find nearest neighbors, A, B substitute with interior node at their midpoint