750 likes | 945 Views
Phylogeny. Presented By Dr. Shazzad Hosain Asst. Prof. EECS, NSU. What is phylogenetics?. Phylogenetics is the study of evolutionary relationships among and within species. birds. snakes. rodents. primates. crocodiles. marsupials. lizards. crocodiles. birds. lizards. snakes.
E N D
Phylogeny Presented By Dr. ShazzadHosain Asst. Prof. EECS, NSU
What is phylogenetics? Phylogenetics is the study of evolutionary relationships among and within species. birds snakes rodents primates crocodiles marsupials lizards
crocodiles birds lizards snakes rodents primates marsupials What is phylogenetics? This is an example of a phylogenetic tree.
Applications of phylogenetics • Forensics: Did a patient’s HIV infection result from an invasive dental procedure performed by an HIV+ dentist? • Conservation: How much gene flow is there among local populations of island foxes off the coast of California? • Medicine: What are the evolutionary relationships among the various prion-related diseases? To be continued…
Sequence A Sequence B Sequence C Sequence D Sequence E Phylogenetic concepts:Interpreting a Phylogeny Which sequence is most closely related to B? A, because B diverged from A more recently than from any other sequence. Physical position in tree is not meaningful! Only tree structure matters. Time
A A A B B ? ? X X B ? = = Root Root ? C ? ? D D C C D Time Phylogenetic concepts:Rooted and Unrooted Trees
chicken human fruit fly chicken oak human – bones + bones bacteria oak archaea – cell nuclei fruit fly bacteria archaebacteria oak bacteria archaebacteria fruit fly + cell nuclei human chicken Rooting and Tree Interpretation
Rooting Methods Outgroup Rooting a network of relationships Given an unrooted network of relationships among four species of Carnivora [left], outgroup rooting uses an additional taxon (the outgroup) known from independent evidence to be less closely related to any of the other species (the ingroup) than they are to each other. The root is then placed on the branch between the outgroup and the ingroup. In this case, Lynx is a feloid carnivore in a separate superfamily from the four canoid carnivores. Inclusion of Lynx in the network analysis places it on the internode.This method requires accurate information as to ingroup / outgroup relationships.
How Many Trees? (assuming bifurcation only)
Unrooted trees Rooted trees # sequences # pairwise distances # trees # branches /tree # trees # branches /tree 3 3 1 3 3 4 4 6 3 5 15 6 5 10 15 7 105 8 6 15 105 9 945 10 10 45 2,027,025 17 34,459,425 18 30 435 8.69 1036 57 4.95 1038 58 N N (N - 1) 2 (2N - 5)! 2N - 3 (N - 3)! 2N - 3 (2N - 3)! 2N - 2 (N - 2)! 2N - 2 How Many Trees?
Ultrametricity All tips are an equal distance from the root. Additivity Distance between any two tips equals the total branch length between them. X X a a Y b b e Y e c c d d Root Root a = b + c + d + e XY = a + b + c + d + e Tree Properties In simple scenarios, evolutionary trees are ultrametric and phylograms are additive.
Terminology • External nodes: things under comparison; operational taxonomic units (OTUs) • Internal nodes: ancestral units; hypothetical; goal is to group current day units • Root: common ancestor of all OTUs under study. Path from root to node defines evolutionary path • Unrooted: specify relationship but not evolutionary path • If have an outgroup (external reason to believe certain OTU branched off first), then can root • Topology: branching pattern of a tree • Branch length: amount of difference that occurred along a branch
Phylogeny Applications • Tree of Life: Analyzing changes that have occurred in evolution of different organisms http://tolweb.org/tree/phylogeny.html • Phylogenetic relationships among genes can help predict which ones might have similar functions (e.g., ortholog detection) • Follow changes occurring in rapidly changing species (e.g., HIV virus)
Phylogeny Packages • PHYLIP, Phylogenetic inference package • evolution.genetics.washington.edu/phylip.html • Felsenstein • Free! • PAUP, phylogenetic analysis using parsimony • paup.csit.fsu.edu • Swofford
Similarity vs. Homology • Similar • sequences resemble one another • Homolog • sequences derived from common ancestor • Ortholog • homologous sequences within a species • Paralog • homologous sequences between species
Ortholog vs. Paralog • Ortholog • genomic variation occurs after speciation • hence can be used for phylogeny of organism • Paralog • genetic duplication occurs before speciation • hence not suitable for phylogeny of organism
Homoplasy • Sequence similarity NOT due to common ancestry • May arise due to parallelism or convergent evolution • Parallelism or parallel evolution • the development of a similar trait in related, but distinct, species descending from the same ancestor, but from different clades • Convergent evolution
Parallel evolution Parallel evolution occurs when two species that have descended from the same ancestor remain similar over long periods of time because they independently acquire the same evolutionary adaptations. Parallel evolution occurs because genetically related species adapt to similar environmental changes in similar ways. After many years, the organisms may still resemble each other, even though they speciated in the distant past.
Convergent evolution when species from different ancestors colonize the same environment, they may independently acquire the same adaptations. The evolution of species descended from different ancestors to become superficially similar because they are adapting to the same environment is called convergent evolution
Phylogeny of what? • Organisms • Whole genome phylogeny • Ribosomal RNA (surrogate for whole genome) • Strains (closely related microbes) • Individual genes (or gene families) • Repetitive DNA sequences • Metabolic pathways • Secondary Structures • Any discrete character(s) • Human languages • Microbial communities
Why compute phylogenetic trees? • Understand evolutionary history • Map pathogen strain diversity for vaccines • Assist in epidemiology • Of infectious diseases • Of genetic defects • Aid in prediction of function of novel genes • Biodiversity studies • Understanding microbial ecologies
Computational Approaches toPhylogenetic Tree Computation • Distance Based Methods • UPGMA • Neighbor joining • Character State Methods • Maximum Parsimony Method • Maximum Likelihood Methods • Tree merging • Consensus trees, super-trees
What data is used to build trees? • Traditionally: morphological features (e.g., number of legs, beak shape, etc.) • Today: Mostly molecular data (e.g., DNA and protein sequences)
Data for Phylogeny • Can be classified into two categories: • Numerical data • Distance between objects • e.g., distance(man, mouse)=500, • distance(man, chimp)=100 • Usually derived from sequence data • Discrete characters • Each character has finite number of states • e.g., number of legs = 1, 2, 4 • DNA = {A, C, T, G}
2. Determine the evolutionary distances and build distance matrix - A simple example • AGGCCATGAATTAAGAATAA • AGCCCATGGATAAAGAGTAA • AGGACATGAATTAAGAATAA • AAGCCAAGAATTACGAATAA Distance Matrix In this example the evolutionary distance is expressed as the number of nucleotide differences for each sequence pair. For example, sequences 1 and 2 are 20 nucleotides in length and have four differences, corresponding to an evolutionary difference of 4/20 = 0.2.
3. Phylogenetic Tree Construction example (UPGMA algorithm) 1. Pick smallest entry Dij 2. Join the two intersecting species and assign branch lengths Dij/2to each of the nodes UPMGA (Michener & Sokal 1957) Bear Raccoon 0.130.13
3. Phylogenetic Tree Construction example (UPGMA algorithm) Bear Raccoon 0.13 0.13 3.Compute new distances to the other species using arithmetic means
3. Phylogenetic Tree Construction example (UPGMA algorithm) Bear Raccoon Seal 0.13 0.18250.1825 • 1. Pick smallest entry Dij • 2. Join the two intersecting species and assign branch lengths Dij/2 to each of the nodes
3. Phylogenetic Tree Construction example (UPGMA algorithm) Bear Raccoon Seal 0.13 0.18250.1825 • Compute new distances to the other species using arithmetic means
3. Phylogenetic Tree Construction example (UPGMA algorithm) Bear Raccoon Seal Weasel 0.13 0.1825 0.2 0.2 • Pick smallest entry Dij. • Join the two intersecting species and assign branch lengths Dij/2 to each of the nodes. • Done!
Downside of UPGMA • Assume molecular clock (assuming the evolutionary rate is approximately constant) • Generates only rooted tree • Trees are ultrametric • Doesn’t work the following case:
Computational Approaches toPhylogenetic Tree Computation • Distance Based Methods • UPGMA • Neighbor joining • Character State Methods • Maximum Parsimony Method • Maximum Likelihood Methods • Tree merging • Consensus trees, super-trees
Neighbor-joining method • Developed in 1987 by Saitou and Nei • Works in a similar fashion to UPGMA • Still fast – works great for large dataset • Doesn’t require the data to be ultrametric • Great for largely varying evolutionary rates
How to construct a tree with Neighbor-joining method? • Step 1: • Calculate sum all distance from x and divide by (leaves – 2) • Sx = (sum all Dx) / (leaves - 2) • Step 2: • Calculate pair with smallest M • Mij = Distance ij – Si – Sj • Step 3: • Create a node U that joins pair with lowest Mij • S1U = (Dij / 2) + (Si – Sj) / 2
How to construct a tree with Neighbor-joining method? • Step 4: • Join I and j according to S and make all other taxa in form of a star • Step 5: • Recalculate new distance matrix of all other taxa to U with: • DxU = Dix + Djx - Dij
Example of Neighbor-joining • Step 1: S calculation : Sx = (sum all Dx) / (leaves - 2) • S(A) = (5 + 4 + 7 + 6 + 8) / 4 = 7.5 • S(B) = (5 + 7 + 10 + 9 + 11) / 4 = 10.5 • S(C) = (4 + 7 + 7 + 6 + 8) / 4 = 8 • S(D) = (7+ 10 + 7 + 5 + 9) / 4 = 9.5 • S(E) = (6 + 9 + 6 + 5 + 8) / 4 = 8.5 • S(F) = (8 + 11 + 8 + 9 + 8) / 4 = 11
Example of Neighbor-joining cont 1 • Step 2: Calculate pair with smallest M • Mij = Distance ij – Si – Sj • Smallest are • M(AB) = d(AB) – S(A) –S(B) = 5 – 7.5 – 10.5= -13 • M(DE) = 5 – 9.5 – 8.5 = -13
Example of Neighbor-joining cont 2 • Step 3: Create a node U • S1U = (Dij / 2) + (Si – Sj) / 2 • U1 joins A and B: • S(AU1) = d(AB) / 2 + (S(A) – S(B)) / 2 • = 5 / 2 + (7.5 - 10.5) / 2 = 1 • S(BU1) = d(AB) / 2 + (S(B) – S(A)) / 2 • = 5 / 2 + (10.5 – 7.5) / 2 = 4
Example of Neighbor-joining cont 3 • Step 4: Join A and B according to S, and make all other taxa in form of a star. Branches in black are unknown length and Branches in red are known length
Example of Neighbor-joining cont 4 • Step5: Calculate new distance matrix • Dxu = (Dix + Djx – Dij) / 2 • d(CU) = (d(AC) + d(BC) - d(AB)) / 2 • = (4 + 7 - 5) / 2 =3 • d(DU) = d(AD) + d(BD) - d(AB) / 2 = 6 • Same as EU and FU • Then we get the new distance matrix
Example of Neighbor-joining cont 5 • Repeat 1 to 5 until all branches are done • In this example, we will get this at the end
Downside of Neighbor-joining • Generates only one possible tree • Generates only unrooted tree
Computational Approaches toPhylogenetic Tree Computation • Distance Based Methods • UPGMA • Neighbor joining • Character State Methods • Maximum Parsimony Method • Maximum Likelihood Methods • Tree merging • Consensus trees, super-trees
AAA 0 1 0 AAA 0 1 AAA AGA 0 0 1 1 1 0 2 AAA AAA GGA AGA AAA GGA AAG AAA AGA AAG Maximum Parsimony Method • Parsimony-score: • Number of character-changes (mutations) along the evolutionary tree • (tree containing labels on internal vertices) • Example: Score = 3 Score = 4 Most parsimonious tree: Tree with minimal parsimony score Minimal Evolution Principle