520 likes | 759 Views
BCB 444/544. Lecture 30 Phylogenetics – Distance-Based Methods #30_Nov02. Required Reading ( before lecture). Wed Oct 30 - Lecture 29 Phylogenetics Basics Chp 10 - pp 127 - 141 Thurs Oct 31 - Lab 9 Gene & Regulatory Element Prediction Fri Oct 30 - Lecture 30
E N D
BCB 444/544 Lecture 30 Phylogenetics – Distance-Based Methods #30_Nov02 BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Required Reading (before lecture) Wed Oct 30 - Lecture 29 Phylogenetics Basics • Chp 10 - pp 127 - 141 Thurs Oct 31 - Lab 9 Gene & Regulatory Element Prediction Fri Oct 30 - Lecture 30 Phylogenetic – Distance-Based Methods • Chp 11 - pp 142 – 169 Mon Nov 5 - Lecture 31 Phylogenetics – Parsimony and ML • Chp 11 - pp 142 - 169 BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Assignments & Announcements Mon Oct 29 - HW#5 HW#5 = Hands-on exercises with phylogenetics and tree-building software Due: Mon Nov 5 (not Fri Nov 1 as previously posted) BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
BCB 544 "Team" Projects Last week of classes will be devoted to Projects • Written reports due: • Mon Dec 3(no class that day) • Oral presentations (20-30') will be: • Wed-Fri Dec 5,6,7 • 1 or 2 teams will present during each class period • See Guidelines for Projects posted online BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
BCB 544 Only: New Homework Assignment 544 Extra#2 Due: √PART 1 - ASAP PART 2 - meeting prior to 5 PM Fri Nov 2 Part 1 - Brief outline of Project, email to Drena & Michael after response/approval, then: Part 2 - More detailed outline of project Read a few papers and summarize status of problem Schedule meeting with Drena & Michael to discuss ideas BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Seminars this Week BCB List of URLs for Seminars related to Bioinformatics: http://www.bcb.iastate.edu/seminars/index.html • Nov 2 Fri - BCB Faculty Seminar 2:10 in 102 ScI • Bob Jernigan BBMB, ISU • Control of Protein Motions by Structure BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Chp 10 - Phylogenetics SECTION IV MOLECULAR PHYLOGENETICS Xiong: Chp 10 Phylogenetics Basics • Evolution and Phylogenetics • Terminology • Gene Phylogeny vs. Species Phylogeny • Forms of Tree Representation • Why Finding a True Tree is Dificult • Procedure of Building a Phylogenetic Tree BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Tree Building Procedure • Choose molecular markers • Perform MSA • Choose a model of evolution • Determine tree building method • Assess tree reliability BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Choice of Molecular Markers • Very closely related organisms - nucleic acid sequence will show more differences • For individuals within a species - faster mutation rate is in noncoding regions of mtDNA • More distantly related species - slowly evolving nucleic acid sequences like ribosomal RNA or protein sequences • Very distantly related species - use highly conserved protein sequences BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Multiple Sequence Alignment • Most critical step in tree building - cannot build correct tree without correct alignment • Should build alignments with multiple programs, then inspect and compare to identify the most reasonable one • Most alignments need manual editing • Make sure important functional residues align • Align secondary structure elements • Use full alignment or just parts BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Automatic Editing of Alignments • Rascal and NorMD – correct alignment errors, remove potentially unrelated or highly divergent sequences • Gblocks – detect and eliminate poorly aligned positions and divergent regions BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
How do we measure divergence between sequences? • Simple measure – just count the number of substitutions observed between the sequences in the MSA • Problem – number of substitutions may not represent the number of evolutionary events that actually occurred BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Multiple Substitutions C A A T G Just because we only see one difference, does not mean that there was only one evolutionary event BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Multiple Substitutions A A A T G Just because we only see no difference, does not mean that there were no evolutionary events BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Choosing Substitution Models • Statistical models of evolution are used to correct for the multiple substitution problem • Focus on DNA models BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Jukes-Cantor Model • Jukes-Cantor model assumes all nucleotides are substituted with equal probability • Can be used to correct for multiple substitutions BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Many Other Models BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Evolutionary Models for Protein Sequences • PAM and JTT substitution matrices already take into account multiple substitutions • There are also models similar to Jukes-Cantor for protein sequences BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
What about differences in mutation rates between positions within a sequence? • One of our assumptions was that all positions in a sequence are evolving at the same rate • Bad assumption • Third position in a codon changes with higher frequency • In proteins, some amino acids can change and others cannot • This variation is called among-site rate heterogeneity • Many tree building programs have parameters meant to deal with this problem – adds to complexity of getting the correct tree BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Chp 11 – Phylogenetic Tree Construction Methods and Programs SECTION IV MOLECULAR PHYLOGENETICS Xiong: Chp 11 Phylogenetic Tree Construction Methods and Programs • Distance-Based Methods • Character-Based Methods • Phylogenetic Tree Evaluation • Phylogenetic Programs BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Tree Construction • Two main categories of tree building methods • Distance-based • Overall similarity between sequences • Character-based • Consider the entire MSA BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Distance-Based Methods • Given a MSA and an evolutionary model, calculate the distance between all pairs of sequences • Construct distance matrix • Construct phylogenetic tree based on the distance matrix BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Distance Matrices a 0 b 6 0 c 7 3 0 d 14 10 9 0 a b c d 0 1 2 3 4 5 6 7 8 a b c d BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Distance-Based Methods • Two ways to construct a tree based on a distance matrix • Clustering • Optimality BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Clustering-Based Methods • E.g., UPGMA and Neighbor-Joining • A cluster is a set of taxa • Interspecies distances translate into intercluster distances • Clusters are repeatedly merged • “Closest” clusters merged first • Distances are recomputed after merging BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
UPGMA • UPGMA – Unweighted Pair Group Method Using Arithmetic Average • Uses molecular clock assumption – all taxa evolve at a constant rate and are equally distant from the root (ultrametric tree) • This assumption is usually wrong • So why use UPGMA? • Very fast BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
UPGMA Example BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
UPGMA Example BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
UPGMA Example BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
UPGMA Example BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Neighbor Joining • Idea: Find a pair of taxa that are close to each other but far from other taxa • Implicitly finds a pair of neighboring taxa • No molecular clock assumption BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Neighbor Joining • NJ corrects for unequal evolutionary rates between sequences by using a conversion step • The conversion step requires calculation of “r-values” and “transformed r-values” BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Neighbor Joining The r-value for a sequence is: The sum of the distances between sequence i and all other sequences BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Neighbor Joining The transformed r-value for a sequence is: Where n is the number of taxa Transformed r-values are used to determine the distance of a taxon to the nearest node BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Neighbor Joining The converted distance between two sequences is: These converted distances are used in building the tree BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Neighbor Joining The final equation we need is for computing the distance from a new cluster to each taxa. Assume taxa i and j were merged into a cluster u. The distance from taxa i to cluster u is: BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Neighbor Joining Example BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Neighbor Joining Example • Initialize tree into a star shape with all taxa connected to the center • Step 1: Compute r-values and transformed r-values for all taxa BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Neighbor Joining Example • Step 2: Compute converted distances BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Neighbor Joining Example • Step 3: Fill out converted distance matrix BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Neighbor Joining Example • Step 4: Create a node by merging closest taxa • In this example, the distance between A and B is the same as the distance between C and D • We can pick either pair to start with • Let’s pick A and B and create a node called U B ? A A U B ? D C BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Neighbor Joining Example • Step 5: Compute branch lengths • Use the equation for computing the distance from a taxa to a node 0.15 A U B 0.25 BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Neighbor Joining Example • Step 6: Construct reduced distance matrix by computing converted distances from each taxa to the new node U • In UPGMA, we simply calculated the average BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Neighbor Joining Example Our reduced distance matrix: BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Neighbor Joining Example • From here, we go back to step 1 • Continue until all taxa have been decomposed from the star tree BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Optimality-Based Methods • Clustering methods produce a single tree with no ability to judge how good it is compared to alternative tree topologies • Optimality-based methods compare all possible tree topologies and select a tree that best fits the distance matrix • Two algorithms: • Fitch-Margoliash • Minimum evolution BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Fitch-Margoliash • Selects best tree among all possible trees based on minimum deviation between distances calculated in the tree and distances in the distance matrix • Basically, a least squares method • Dij = distance between i and j in matrix • dij = distance between i and j in tree • Objective: Find tree that minimizes BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Minimum Evolution • Similar to Fitch-Margoliash, but uses a different optimality criterion • Searches for a tree with the minimum total branch length • This is an indirect way of achieving the best fit of the branch lengths with the original data BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Summary of Distance-Based Methods • Clustering-based methods: • Computationally very fast and can handle large datasets that other methods cannot • Not guaranteed to find the best tree • Optimality-based methods: • Better overall accuracies • Computationally slow • All distance-based methods lose all sequence information and cannot infer the most likely state at an internal node BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods