1 / 33

Phylogenetic Inference

Phylogenetic Inference. Data Optimality Criteria Algorithms Results Practicalities. Reading: Ch8. BIO520 Bioinformatics Jim Lund. Our Goals. Infer Phylogeny Optimality criteria Algorithm Determine the sequence of branching events that reflects the history of a group of organisms.

pillan
Download Presentation

Phylogenetic Inference

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Phylogenetic Inference Data Optimality Criteria Algorithms Results Practicalities Reading: Ch8 BIO520 Bioinformatics Jim Lund

  2. Our Goals • Infer Phylogeny • Optimality criteria • Algorithm • Determine the sequence of branching events that reflects the history of a group of organisms.

  3. Phylogenetic Model Assumptions • No transfer of genetic information by hybridization • All sequences are homologous (orthologous, really) • Each position in alignment homologous • Observed variation is valid sample from included group • Positions evolve independently

  4. Steps in Analysis • Data Model (Alignment) • alignment method • “trimming” to a phylogenetic set • DNA base substitution model • Build Trees • Algorithm based vs Criterion based • Distance based vs Character-based • Assess tree quality.

  5. Choice of Input Data • Data Type • Aligned sequences, RFLP, morphological data… • Molecule of interest • rRNA (general purpose) • Mitochondrial DNA • Selected genes • Number/type of taxa • ingroup and outgroup

  6. rRNA Genes • Conserved across kingdoms • Varies within species • Widely sequenced, easy • Long, lots of characters Duplication?

  7. Multiple Alignment Method • Phylogenetic Assumptions • Alignment parameters • (substitution matrix, gap cost) • Aligned features • primary sequence, structure • Optimization • statistical, non-statistical

  8. Typical Alignment Method • CLUSTAL, then manual editing • Manual editing for phylogeny • phylogenetic assumption in guide tree • parameters a priori and dynamic • Optimization • Non-statistical • Remove poorly aligned regions • Test several gap penalties

  9. Substitution Models • G to A, C to T versus N to N • Amino acid substitution • Forwards and backwards weights identical? • Site-to-site variation Simpler model better Estimate from "quick" tree building, Observed Variation

  10. Tree-Building Methods • Distance-based methods • NJ, FM, ME, UPGMA • Character-based methods • Maximum Parsimony (PAUP) • Maximum Likelihood (PHYLIP) Algorithm choice is a contested, active research field.

  11. Molecular phylogenetic tree building methods: Are mathematical and/or statistical methods for inferring the divergence order of taxa, as well as the lengths of the branches that connect them. There are many phylogenetic methods available today, each having strengths and weaknesses. Most can be classified as follows: COMPUTATIONAL METHOD Optimality criterion Clustering algorithm Characters (bp, aa) PARSIMONY MAXIMUM LIKELIHOOD DATA TYPE MINIMUM EVOLUTION LEAST SQUARES UPGMA NEIGHBOR-JOINING Distances

  12. Distance Methods • Measure distance (dissimilarity) • Accurate if distances are all summative (ultrametric) • NEVER true over large distance • Methods • NJ (Neighbor joining) • FM (Fitch-Margoliash) • ME (Minimal Evolution) • UPGMA (Unweighted pair group method with Arithmetic Mean)

  13. Which Distance Method? • UPGMA (Unweighted pair group method with Arithmetic Mean) • Least accurate, still commonly used • NJ (Neighbor joining) • EXTREMELY RAPID • GIVES ONLY 1 TREE • ME (Minimal Evolution) and FM (Fitch-Margoliash) seem best • Minimize tree path lengths

  14. Inferring Trees and Ancestors CCCAGG CCCAAG-> CCCAAG CCCAAA-> CCCAAA CCCAAA-> CCCAAC

  15. Different Criteria 1 CCCAGG 2 CCCAAG 3 CCCAAA 4 CCCAAC 1,2 can be sister taxa AND 3,4 can be sister taxa Infer ancestor of 1,2 and 3,4 Distance from 1/2, 3/4 equal

  16. Character Methods • Maximum Parsimony • minimal changes to produce data • can use different substitution models • Maximum Likelihood • turns problem “inside out”, single most likely tree that explains data • coin flip analogy • increasingly popular • Bayesian • Searches for Best Set of trees that explains data AND fits evolutionary model

  17. Parsimony CCCAGG CCCAAG-> CCCAAG CCCAAA-> CCCAAA CCCAAA-> CCCAAC 4 TAXA, 3 changes minimum Search for shortest tree, the one with the fewest changes.

  18. Likelihood Models Hypothesis 1: All 3 teams are equally good. Hypothesis 2: The Yankees are the best team. Hypothesis 3: The Tigers are the worst team

  19. Searching for Trees

  20. Tree Search Algorithms • Exhaustive • VERY INTENSIVE • Branch and Bound • Compromise • Heuristic • FAST (usually start with NJ)

  21. Evaluating Trees • Consensus Tree • Randomized Trees • Skewness tests • Randomized Character Data • Permutation tests (permuted by column) • Bootstrap, Jackknife • resampling techniques • Counts how often each clade appears in test data. • >70% probably correct; 50% overestimates accuracy

  22. Tree Congruence • Tree-to-Tree Comparison • 2 different characters/same groups • Important for evaluating biological hypotheses • Example: • Did lentiviruses diverge within their current hosts only? • Or did plant pathogenicity has arisen many times in fungi?

  23. B C Root D A A C B D Rooted tree Note that in this rooted tree, taxon A is no more closely related to taxon B than it is to C or D. Root Inferring evolutionary relationships between the taxa requires rooting the tree: To root a tree mentally, imagine that the tree is made of string. Grab the string at the root and tug on it until the ends of the string (the taxa) fall opposite the root: Unrooted tree

  24. Now, try it again with the root at another position: B C Root Unrooted tree D A A B C D Rooted tree Note that in this rooted tree, taxon A is most closely related to taxon B, and together they are equally distantly related to taxa C and D. Root

  25. Rooting Trees • Molecular Clock • Root=midpoint of longest span • Unreliable, often wrong. • Evidence • select fungus as root for plants, eg • long branch attraction can be Extrinsic problem • Paralog rooting • long branch problems

  26. Phylogenetic Software • PHYLIP • http://evolution.genetics.washington.edu/phylip.html • http://saf.bio.caltech.edu/www/saf_manuals/phylip/phylip.html • PAUP: Pileup, Lineup, Paupsearch, Paupdisplay • http://paup.csit.fsu.edu/versions.html • MrBayes • Bayesian trees • http://mrbayes.csit.fsu.edu/ • Treeview • Several programs going by this name have been written. • Draw/format phylogenic trees • Jave TreeView: http://jtreeview.sourceforge.net/

  27. Phylogenetic Stories • HIV • complete genome accessible • evolution rapid • selection, neutralism? • Primate evolution • Which primate is the closest relative to modern humans?

  28. HIV Genome Diversity • Error prone (RT) replication • High rate of replication • 1010 virions/day • In vivo selection pressure And In vivo recombination!

  29. HIV tree ENV GAG AIDS 1996, 10:S13 Recombinants?

  30. Subtype E ENV=A “Bootscanning” AIDS 1996, 10:S13

  31. Which species are the closest living relatives of modern humans? Mitochondrial DNA, most nuclear DNA-encoded genes, and DNA/DNA hybridization all show that bonobos and chimpanzees are related more closely to humans than either are to gorillas. Humans Gorillas Chimpanzees Chimpanzees Bonobos Bonobos Gorillas Orangutans Orangutans Humans 14 0 0 15-30 MYA MYA The pre-molecular view was that the great apes (chimpanzees, gorillas and orangutans) formed a clade separate from humans, and that humans diverged from the apes at least 15-30 MYA.

  32. Phylogenetic Resources • NCBI Taxonomy Browser • http://www.ncbi.nlm.nih.gov/Taxonomy/ • RDP database (Ribosomal Database Project) • http://rdp.cme.msu.edu/index.jsp • “Tree of Life” • http://tolweb.org/tree/phylogeny.html

  33. Practicalities • Quality of input alignment critical • Examine data from all possible angles • distance, parsimony, likelihood, Bayes • Outgroup taxon critical • problem if outgroup shares a selective property with a subset of ingroup • Order of input can be problematic • Jumble them!

More Related