1 / 111

BuildingTrees

BuildingTrees. What is a Tree?. A tree is a visualization of the mathematical analysis of a comparison of characteristics in multiple individuals or species. The multiples can also be tissues or developmental stages in the case of microarrays.

dafydd
Download Presentation

BuildingTrees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BuildingTrees

  2. What is a Tree? • A tree is a visualization of the mathematical analysis of a comparison of characteristics in multiple individuals or species. The multiples can also be tissues or developmental stages in the case of microarrays. • The closer branches share more similarities and the more distant branches are less similar.

  3. Phylogeny (phylo =tribe + genesis) 1.Phylogeny inference or “tree building” — the inference of the branching orders, and ultimately the evolutionary relationships, between “taxa” (entities such as genes, populations, species, etc.) 2.Character and rate analysis — using phylogenies as analytical frameworks for rigorous understanding of the evolution of various traits or conditions of interest

  4. Start with a group of species and establish relationships based on measurements birds snakes rodents primates crocodiles marsupials lizards

  5. crocodiles birds lizards snakes rodents primates marsupials This is an example of a phylogenetic tree.

  6. Homology & Similarity • Homology • Conserved sequences arising from a common ancestor • Orthologs: homologous genes that share a common ancestor in the absence of any gene duplication (Mouse and Human Hemoglobin) • Paralogs: genes related through gene duplication (one gene is a copy of another - Fetal and Adult Hemoglobin) • Similarity • Genes that share common sequences but are not necessarily related

  7. Sequences As Modules • Proteins are derived from a limited number of basic building blocks (Modules) • Evolution has shuffled these modules giving rise to a diverse repertoire of protein sequences • Proteins can share a global or local relationships specific to a single DOMAIN Global Local

  8. Sequence Domains Modules Define Functional/Structural Domains

  9. Defining A Sequence Family Family B Family E Family D Family A Family C

  10. Global vs. Local Alignments • Global • Search for alignments, matching over entire sequences • Local • Examine regions of sequence for conserved segments • Both Consider: Matches, Mismatches, Gaps

  11. Global Sequence Alignments Yeast Prion-Like Proteins

  12. How To Make A Global MSA • On The Web • http://pir.georgetown.edu/pirwww/search/multaln.html • On Your Computer • ClustalX: http://www-igbmc.u-strasbg.fr/BioInfo/ClustalX/

  13. MSA Example Sequences Standard FASTA Sequence Format >KSYK_HUMAN FFFGNITREEAEDYLVQGGMSDGLYLLRQSRNYLGGFALSVAHGRKAHHYTIERELNGTYAIAGGRTHASPADLCHYH >ZA70_HUMAN WYHSSLTREEAERKLYSGAQTDGKFLLRPRKEQGTYALSLIYGKTVYHYLISQDKAGKYCIPEGTKFDTLWQLVEYL >KSYK_PIG WFHGKISRDESEQIVLIGSKTNGKFLIRARDNGSYALGLLHEGKVLHYRIDKDKTGKLSIPGGKNFDTLWQLVEHY >MATK_HUMAN WFHGKISGQEAVQQLQPPEDGLFLVRESARHPGDYVLCVSFGRDVIHYRVLHRDGHLTIDEAVFFCNLMDMVEHY >CSK_CHICK WFHGKITREQAERLLYPPETGLFLVRESTNYPGDYTLCVSCEGKVEHYRIIYSSSKLSIDEEVYFENLMQLVEHY >CRKL_HUMAN WYMGPVSRQEAQTRLQGQRHGMFLVRDSSTCPGDYVLSVSENSRVSHYIINSLPNRRFKIGDQEFDHLPALLEFY >YES_XIPHE WYFGKLSRKDTERLLLLPGNERGTFLIRESETTKGAYSLSLRDWDETKGDNCKHYKIRKLDNGGYYITTRTQFMSLQMLVKHY >FGR_HUMAN WYFGKIGRKDAERQLLSPGNPQGAFLIRESETTKGAYSLSIRDWDQTRGDHVKHYKIRKLDMGGYYITTRVQFNSVQELVQHY >SRC_RSVP WYFGKITRRESERLLLNPENPRGTFLVRKSETAKGAYCLSVSDFDNAKGPNVKHYKIYKLYSGGFYITSRTQFGSLQQLVAYY

  14. MSA Example Result YES_XIPHE WYFGKLSRKDTERLLLLPGNERGTFLIRESETTKGAYSLSLRDWDETKGDNCKHYKIRKL FGR_HUMAN WYFGKIGRKDAERQLLSPGNPQGAFLIRESETTKGAYSLSIRDWDQTRGDHVKHYKIRKL SRC_RSVP WYFGKITRRESERLLLNPENPRGTFLVRKSETAKGAYCLSVSDFDNAKGPNVKHYKIYKL MATK_HUMAN WFHGKISGQEAVQQLQPPED--GLFLVRESARHPGDYVLCVS-----FGRDVIHYRVLHR CSK_CHICK WFHGKITREQAERLLYPPET--GLFLVRESTNYPGDYTLCVS-----CEGKVEHYRIIYS CRKL_HUMAN WYMGPVSRQEAQTRLQGQRH--GMFLVRDSSTCPGDYVLSVS-----ENSRVSHYIINSL ZA70_HUMAN WYHSSLTREEAERKLYSGAQTDGKFLLRPRK-EQGTYALSLI-----YGKTVYHYLISQD KSYK_PIG WFHGKISRDESEQIVLIGSKTNGKFLIRAR--DNGSYALGLL-----HEGKVLHYRIDKD KSYK_HUMAN FFFGNITREEAEDYLVQGGMSDGLYLLRQSRNYLGGFALSVA-----HGRKAHHYTIERE :: . : :: : * :*:* * : * : ** : YES_XIPHE DNGGYYITTRTQFMSLQMLVKHY FGR_HUMAN DMGGYYITTRVQFNSVQELVQHY SRC_RSVP YSGGFYITSRTQFGSLQQLVAYY MATK_HUMAN -DGHLTIDEAVFFCNLMDMVEHY CSK_CHICK -SSKLSIDEEVYFENLMQLVEHY CRKL_HUMAN PNRRFKIGDQE-FDHLPALLEFY ZA70_HUMAN KAGKYCIPEGTKFDTLWQLVEYL KSYK_PIG KTGKLSIPGGKNFDTLWQLVEHY KSYK_HUMAN LNGTYAIAGGRTHASPADLCHYH * . : .

  15. Steps to Build Trees from MSA 1) identify taxa to be considered 2) choose characters (independent, “unit”) 3) construct character matrix for each taxon: 4) After performing alignment, use mathematical formula to describe degree of similarity for each taxon: e.g. simple matching coefficient # matches total # of characters S =

  16. Steps to Build Trees 5) construct matrix with pairwise S values 6) use clustering technique to produce a tree (dendrogram) • Unweighted/Equal weighting = all characters given equal consideration • UPGMA (Unweighted Pair Group Method with Arithmetic Averaging) • Neighbour-joining • Unweighting is a form of weighting

  17. Building Matrices Character Matrix S-value Matrix

  18. Joining Clusters into a Tree Closest: A&D = 0.7 2nd Closest B&C = 0.5 When does A&D join B&C ? (A&B) + (A&C) + (D&B) + (D&C) 4 = (0.3 + 0.4 + 0.4 + 0.3)/4 = 0.35

  19. Problems • Different methods or characters = different dendrograms • If we use all possible characteristics this would be a natural classification • The tree is an accurate phylogeny if differences in characters between taxa proportional to time elapsed since common ancestor

  20. Convergent Evolution • Similar phenotypic response to similar ecological conditions • Different developmental pathways

  21. Reversal of Evolution • An altered character reverts to the ancestral form. • In a DNA molecule, a nucleotide position may change from a C to a T and then back to a C. This frog reverted to teeth.

  22. Trees are hypotheses about evolutionary history • Different methods may result in different trees. • How to chose between the different models? • One way is to compare different types of character data and see if the trees make sense.

  23. Haplotype Network in 3 Elephant Species with 3 DNA sequences

  24. Parsimonious choices reflect fewer changes • The assumptions of parsimony • Reversals and convergence require more changes • Parsimonious trees represent best estimates of phylogenetic relationships

  25. Use of DNA, RNA, or Protein • For phylogeny, DNA can be more informative. • The protein-coding portion of DNA has synonymous and nonsynonymous substitutions. • Some DNA changes do not have corresponding protein changes • See arrows 14, 21, 25, 27, 29 in the retinol-binding protein figure.

  26. For phylogeny, DNA can be more informative. • If the synonymous substitution rate (dS) is greater than the nonsynonymous substitution rate (dN), the DNA sequence is under negative (purifying) selection. • This limits change in the sequence. • If dS < dN, positive selection occurs. • For example, a duplicated gene may evolve rapidly to assume new functions.

  27. Models of nucleotide substitution- Transitions > Transversions transition A G transversion transversion C T transition

  28. Some substitutions in a DNA sequence alignment can be directly observed: • single nucleotide substitutions • sequential substitutions • coincidental substitutions

  29. Additional mutational events can be inferred by analysis of ancestral sequences. These changes include • parallel substitutions • convergent substitutions • back substitutions

  30. Advantages of DNA • Noncoding regions (such as 5’ and 3’ untranslated regions) may be analyzed using molecular phylogeny. • See Figure 11.10 (arrows 4-10 and 35-38) • Pseudogenes (nonfunctional genes) are studied by molecular phylogeny • Rates of transitions and transversions can be measured. • Transitions: purine (A to G) or pyrimidine (C to T) substitutions • Transversion: purine to pyrimidine

  31. Protein sequences are also used for phylogeny • Proteins have 20 states (amino acids) instead of only four for DNA, so there is more phylogenetic information. • Nucleotides are unordered characters: any one nucleotide can change to any other in one step. • An ordered character must pass through one or more intermediate states before reaching the final state. • Amino acid sequences are partially ordered character states: there is a variable number of states between the starting value and the final value.

  32. Amino acid sequences • From the standpoint of the genetic code, some amino acid changes can be made by a single DNA mutation while others require two or even three changes in the DNA sequence • Some amino acids can replace one another with relatively little effect on thestructure and function of the final protein while other replacements can befunctionally devastating • Tables of frequencies of all amino acid replacements within families of related protein sequences in the databanks are used: PAM and BLOSSUM

  33. Sequence-Based Comparisons • Identify sequences within an organism that are related to each other and/or across different species • Within: Fetal and adult hemoglobin • Across : Human and chimpanzee hemoglobin • Generate an evolutionary history of related genes • Locate insertions, deletions, and substitutions that have occurred during evolution (C) Cysteine (R) Arginine (E) Glutamate (A) Alanine (T) Threonine (S) Serine (L) Leucine (P) Proline (G) Glycine CREATE CREASE -RELAPSE [Ancestor] [Progenitors] GREASER

  34. Multiple Sequence Alignments • Place residues in columns that are derived from a common ancestral residue • Identify Matches, Mismatches, and Gaps • MSA can reveal sequence patterns • Demonstration of homology between >2 sequences • Identification of functionally important sites • Protein function prediction • Structure prediction CREASE CREATE RELAPSE GREASER SeqA CRE-A-TE- SeqB CRE-A-SE- SeqC GRE-A-SER SeqD -RELAPSE- 123456789

  35. MSA and Tree Relationship • “The optimal alignment of several sequences can be thought of as minimizing the number of mutational steps in an evolutionary tree for which the sequences are the leaves” (Mount, 2001) CREATE CREASE CREATE CRE-A-TE- SeqA CREATE CREASE CRE-A-SE- SeqB +R GRE-A-SER SeqC T to S GREASE C to G +L +P -RELAPSE- SeqD -G

  36. Multiple Sequence Alignments • Confirm that all sequences are homologous • Adjust gap creation and extension penalties as needed to optimize the alignment • Restrict phylogenetic analysis to regions of the multiple sequence alignment for which data are available for all taxa (delete columns having incomplete data). • Many experts recommend that you delete any column of an alignment that contains gaps (even if the gap occurs in only one taxon)

  37. Problems in Reconstructing Phylogeny • Characters sometimes conflict • It is sometimes difficult to tell homology from homoplasy • Analogy- characters similar because of convergent evolution • Reversal- character reverts to ancestral form • With morphological characters, careful examination may distinguish homoplasy (orthologs) from homology • With molecular characters (DNA/Protein sequences), orthologs sometimes impossible to distinguish from homologs and paralogs.

  38. A Phylogenetic Tree • Taxon -- Any named group of organisms – evolutionary theory not necessarily involved. • Clade -- A monophyletic taxon (evolutionary theory utilized)

  39. A phylogenetic tree with branch lengths • Branch length can be significant… • In this case it is and mouse is slightly more similar to fly than human is to fly (sum of branches 1+2+3 is less than sum of 1+2+4)

  40. Common Phylogenetic Tree Terminology Terminal Nodes Branches or Lineages A Represent the TAXA (genes, populations, species, etc.) used to infer the phylogeny B C D Ancestral Node or ROOT of the Tree E Internal Nodes or Divergence Points (represent hypothetical ancestors of the taxa)

  41. Taxon B Taxon C No meaning to the spacing between the taxa, or to the order in which they appear from top to bottom. Taxon A Taxon D Taxon E This dimension either can have no scale (for ‘cladograms’), can be proportional to genetic distance or amount of change (for ‘phylograms’ or ‘additive trees’), or can be proportional to time (for ‘ultrametric trees’ or true evolutionary trees). Phylogenetic trees diagram the evolutionary relationships between the taxa ((A,(B,C)),(D,E)) = The above phylogeny as nested parentheses These say that B and C are more closely related to each other than either is to A, and that A, B, and C form a clade that is a sister group to the clade composed of D and E. If the tree has a time scale, then D and E are the most closely related.

  42. time Three types of trees Cladogram Phylogram Ultrametric tree 6 Taxon B Taxon B Taxon B 1 1 Taxon C Taxon C Taxon C 3 1 Taxon A Taxon A Taxon A Taxon D Taxon D 5 Taxon D no meaning genetic change All show the same evolutionary relationships, or branching orders, between the taxa.

  43. cladogram t1 • relative recent common descent. • Does not imply that ancestors on the same line necessarily speciated at the same time. • t1 can bebefore or after t2 but not before t3 t3 t2 Types of trees: Cladogram (no time scale)

  44. branch lengths = amount of change Types of trees: Phylogram phylogram (additive tree: branch lengths can be summed) relative recenct common descent, and

  45. divergence Types of trees: Ultrametric Ultrametric tree (linearized tree) All tree tips are equidistant from the root Amount of change can be scaled to time scale = time

  46. A A A B C E C E C D B B E D D Polytomy or multifurcation A bifurcation The goal of phylogeny inference is to resolve the branching orders of lineages in evolutionary trees Completely unresolved or "star" phylogeny Partially resolved phylogeny Fully resolved, bifurcating phylogeny

More Related