170 likes | 192 Views
Building Phylogenies. Human. Human. Chimp. Chimp. Gorilla. Gorilla. ??. Orangutan. Orangutan. Gibbon. Gibbon. Phylogenetic (evolutionary) trees. Describe evolutionary relationships between species. or. Cannot be known with certainty!. Nevertheless, phylogenies can be useful.
E N D
Human Human Chimp Chimp Gorilla Gorilla ?? Orangutan Orangutan Gibbon Gibbon Phylogenetic (evolutionary) trees Describe evolutionary relationships between species or Cannot be known with certainty! Nevertheless, phylogenies can be useful
Applications of Phylogenetic Analysis • Inferring function • Closely related sequences occupy neighboring branches of tree • Tracking changes in rapidly evolving populations (e.g., viruses) • Which genes are under selection?
Phyloinformatics • Comparative analysis through phylogenies helps to understand biological function • Exploit phylogenies for data mining
Methods • Distance-based • Parsimony • Maximum likelihood
a 0 b 6 0 c 7 3 0 d 14 10 9 0 a b c d 0 1 2 3 4 5 6 7 8 Distance Matrices a b c d
a 0 b 2 0 c 6 6 0 d 10 10 10 0 a b c d 0 1 2 3 4 5 Ultrametric Matrices a b c d
Dij= distance between i and j in matrix dij= distance between i and j in tree Objetive: Find tree that minimizes Least Squares
a 0 1 1 1 A B C D b 0 1 1 1 c 0 0 1 1 d 0 1 1 0 e 0 0 0 1 f 1 0 0 0 Characters A character can be a morphological trait or a letter in a column of an alignment. Characters are represented using matrices
a 0 1 1 1 A B C D b 0 1 1 1 c 0 0 1 1 d 0 1 1 0 e 0 0 0 1 f 1 0 0 0 D A B C Parsimony Goal: Find the tree with least number of evolutionary changes a, b f c d d e
Markov models on trees • Observed: The species labeling the leaves • Hidden: The ancestral states • Transition probabilities: The mutation probabilities • Assumptions: • Only mutations are allowed • Sites are independent • Evolution at each site occurs according to a Markov process
Models of evolution at a site • Transition probability matrix: M = [mij], i, j {A, C, T, G}where mij = Prob(i j mutation in 1 time unit) • Different branches of tree may have different lengths
The probability of an assignment T G T A G C T Probability = mTG · mGA · mGG· mTT· mTC· mTT
Ancestral reconstruction: most likely assignment X Y Z A G C T L* = maxX,Y,Z {mXY · mYA · mYG· mXZ· mZC· mZT}