400 likes | 617 Views
What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: (1) Phylogeny inference or “tree building”
E N D
What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: (1) Phylogeny inference or “tree building” the inference of the branching orders, and ultimately the evolutionary relationships, between “taxa” (entities such as genes, populations, species, etc.) (2) Analyzing change in traits (phenotypes, genes) using phylogenies as analytical frameworks for rigorous understanding of the evolution of various traits or conditions of interest Germline and somatic evolution included!
Uses of Phylogenetics in the Study of • Health & Disease • Evolutionary history of humans, between and within species • Analysis of evolution of phenotypic and genetic traits in humans, especially human-specific traits - evolved when, where, why, how • Evolution of parasites and pathogens, in relation to their hosts (us) • Evolution of cancer cell lineages, and somatic evolution more generally. (5) Study of adaptation in humans and other taxa
What you will learn in this lecture About phylogenies, terminology, what they are, how they work, ‘tree thinking’ (2) How to infer phylogenies (3) How we can use phylogenies to answer questions related to human adaptation, health and disease
Common Phylogenetic Tree Terminology Terminal Nodes Branches or Lineages A Represent the TAXA (genes, populations, species, etc.) used to infer the phylogeny B C D Ancestral Node or ROOT of the Tree E Internal Nodes or Divergence Points (represent hypothetical ancestors of the taxa)
Taxon B Taxon C No meaning to the spacing between the taxa, or to the order in which they appear from top to bottom. Taxon A Taxon D Taxon E This dimension either can have no scale (for ‘cladograms’), can be proportional to genetic distance or amount of change (for ‘phylograms’ or ‘additive trees’), or can be proportional to time (for ‘ultrametric trees’ or true evolutionary trees). Phylogenetic trees diagram the evolutionary relationships between the taxa ((A,(B,C)),(D,E)) = The above phylogeny as nested parentheses These say that B and C are more closely related to each other than either is to A, and that A, B, and C form a clade that is a sister group to the clade composed of D and E. If the tree has a time scale, then D and E are the most closely related.
time Three types of trees Cladogram Phylogram Ultrametric tree 6 Taxon B Taxon B Taxon B 1 1 Taxon C Taxon C Taxon C 3 1 Taxon A Taxon A Taxon A Taxon D Taxon D 5 Taxon D no meaning genetic change All show the same evolutionary relationships, or branching orders, between the taxa.
A A A B C E C E C D B B E D D Polytomy or multifurcation A bifurcation A major goal of phylogeny inference is to resolve the branching orders of lineages in evolutionary trees: Completely unresolved or "star" phylogeny Partially resolved phylogeny Fully resolved, bifurcating phylogeny RESOLUTION AND SUPPORT for nodes
There are three possible unrooted trees for four taxa (A, B, C, D) Tree 1 Tree 2 Tree 3 A C A B A B D D C D B C Phylogenetic tree building (or inference) methods are aimed at discovering which of the possible unrooted trees is "correct". We would like this to be the “true” biological tree — that is, one that accurately represents the evolutionary history of the taxa. However, we must settle for discovering the computationally correct or optimal tree for the phylogenetic method of choice.
A B A C C D B C D A E B C A D E B F The number of unrooted trees increases in a greater than exponential manner with number of taxa (2N - 5)!! = # unrooted trees for N taxa
B C Root D A A C B D Rooted tree Note that in this rooted tree, taxon A is no more closely related to taxon B than it is to C or D. Root Inferring evolutionary relationships between the taxa requires rooting the tree: To root a tree mentally, imagine that the tree is made of string. Grab the string at the root and tug on it until the ends of the string (the taxa) fall opposite the root: Unrooted tree TIME
Now, try it again with the root at another position: B C Root Unrooted tree D A A B C D Rooted tree Note that in this rooted tree, taxon A is most closely related to taxon B, and together they are equally distantly related to taxa C and D. Root TIME
2 4 1 5 3 Rooted tree 1a Rooted tree 1b Rooted tree 1c Rooted tree 1d Rooted tree 1e B A A C D A B D C B C C C A A D B B D D An unrooted, four-taxon tree theoretically can be rooted in five different places to produce five different rooted trees A C The unrooted tree 1: D B These trees showfive different evolutionary relationships among the taxa!
A A C D D B C B B A B C C D D A C B D D A C B A All of these rearrangements show the same evolutionary relationships between the taxa Rooted tree 1a D C A B
Main way to root trees: By outgroup: Uses taxa (the “outgroup”) that are known to fall outside of the group of interest (the “ingroup”). Requires some prior knowledge about the relationships among the taxa. outgroup
COMPUTATIONAL METHOD Optimality criterion Clustering algorithm PARSIMONY MAXIMUM LIKELIHOOD Characters DATA TYPE MINIMUM EVOLUTION LEAST SQUARES UPGMA NEIGHBOR-JOINING Distances Molecular phylogenetic tree building methods: Are mathematical and/or statistical methods for inferring the divergence order of taxa, as well as the lengths of the branches that connect them. There are many phylogenetic methods available today, each having strengths and weaknesses. Most can be classified as follows:
Types of data used in phylogenetic inference: Character-based methods:Use the aligned characters, such as DNA or protein sequences, directly during tree inference. TaxaCharacters Species A ATCGCTAGTCCTATAGTGCA Species B ATCGCTAGTCCTATATTGCA Species C TTCGCTAGACCTGTGGTCCA Species D TTGACCAGACCTGTGGTCCG Species E TTGACCAGTTCTGTGGTCCG ETC ETC
6 Taxon B (eg HUMANS!) 1 1 Taxon C 3 1 Taxon A 5 Taxon D C is more similar in sequence to A (d = 3) than to B (d = 7), but C and B are most closely related (that is, C and B shared a common ancestor more recently than either did with A). Similarity vs. Evolutionary Relationship: Similarity and relationship are not the same thing, even though evolutionary relationship is inferred from certain types of similarity. Similar: having likeness or resemblance (an observation) Related: genetically connected (an historical fact) Two taxa can be most similar without being most closely-related:
Main computational approach: Optimality approaches:Use either character or distance data. First define an optimality criterion (minimum branch lengths, fewest number of events, highest likelihood), and then use a specific algorithm for finding trees with the best value for the objective function. Can identify many equally optimal trees, if such exist. Warning: Finding an optimal tree is not necessarily the same as finding the "true” tree. Random data will give you an ‘optimal’ (best ) tree!
Parsimony methods: • Optimality criterion: The ‘most-parsimonious’ tree is the one that • requires the fewest number of evolutionary events (e.g., nucleotide • substitutions, amino acid replacements) to explain the sequences. • Advantages: • Are simple, intuitive, and logical (many possible by ‘pencil-and-paper’). • Can be used on molecular and non-molecular (e.g., morphological) data. • Can be used for character (can infer the exact substitutions) and rate analysis. • Can be used to infer the sequences of the extinct (hypothetical) ancestors. • Disadvantages: • Not explicitly statistical • Can be fooled by high levels of parallel evolution
Use parsimony to infer the optimal (best) tree Character-based methods:Use the aligned characters, such as DNA or protein sequences, directly during tree inference. TaxaCharacters Species A ATCG CTAGACCTATAGTGCA Species B ATCG CTAGACCTATATTGCA Species C TTCG CTAGACCTGTGGTCCA Species D TTGA CCAGACCTGTGGTCCG Species E TTGA CCAGTTGTGTGGTCCG OUTGROUP TTACCCATTTGTGTCCTCCG Infer maximum parsimony tree using first four characters Quality of trees (how likely it is that they reflect the one True Tree) can be evaluated in various ways (random data will give you a low-quality ‘best’ tree)
We can Statistically Comparealternative trees, corresponding to specific biological hypotheses of the history of some set of lineages
100% Fibrinopeptides Hemoglobin % genetic divergence Cytochrome c 25% 50% 75% Histone IV Time since divergence (Myr) 300 600 900 1200 1500 Timescales on trees: molecular clocks Why such different profiles? Variation in mutation rate? Variation in selection. Genes coding for some molecules under very strong stabilizing selection.
Dates for calibrating molecular clocks can come from geology, fossils, or historical data From known ages of islands, for two genes
Calibrating using fossil data chimps 6 substitutions humans whales 60 substitutions hippos 56 mya
Calibrating from known dates of the ages of samples: for very fast-evolving taxa such as HIV
Uses of Phylogenetics in the Study of Health & Disease Evolutionary history of humans, between and within species Analysis of evolution of phenotypic and genetic traits in humans, especially human-specific traits - evolved when, where, why, how Taxonomy and evolution of parasites and pathogens, and evolution in relation to their hosts Evolution of cancer cell lineages, and somatic evolution more generally. Study of adaptation in humans and other taxa, via analysis of divergence and convergence
EMERGING VIRUSES - THE GREATEST KNOWN HEALTH THREAT TO HUMANITY VIRUS - what IS it? Sequence it’s DNA and relate sequence to known viruses Evolution of SIV and HIV viruses: multiple transfers to humans, from chimps and from green monkeys
SARS (severe acute respiratory syndrome) what causes it and where did it come from?
HIV phylogeny within humans in different regions: Haiti as stepping stone to North America
HIV evolves very rapidly WITHIN hosts, as a result of interactions with the immune system Can do phylogenetics: -Pathogens within individuals, -Pathogens between Individuals (eg in different or same regions) How originate? From other species? How spread? How does resistance to Antibiotics evolve in pathogens, & resistance to chemotherapeutic agents evolve in cancer?
Cancer evolves genetically in the body during carcinogenesis, allowing the inference of ‘oncogenetic trees’ Cytogenetic data: Gains and losses of Chromosomal regions During evolution of cancers; Lose tumor suppressor gene copies, gain Oncogene copies Involves losses of heterozygosity and losses of imprinting
Cancer Evolutionary Phylogenomics Compare primary cancer with metastatic tumors
What you learned in this lecture About phylogenies, terminology, what they are, how they work, ‘tree thinking’ (2) How to infer and evaluate phylogenies (3) How to use phylogenies to answer questions related to human adaptation, health and disease (viruses, cancer, etc) (4) How to THINK in terms of evolutionary trees (historical patterns of evolution), within and between species