1 / 15

Computing the Tree of Life

Computing the Tree of Life. The University of Texas at Austin Department of Computer Sciences Tandy Warnow. Phylogeny. From the Tree of the Life Website, University of Arizona. Orangutan. Human. Gorilla. Chimpanzee. -3 mil yrs. AAGACTT. AAGACTT. -2 mil yrs. AAG G C C T. AAGGCCT.

clio-moreno
Download Presentation

Computing the Tree of Life

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computing the Tree of Life The University of Texas at Austin Department of Computer Sciences Tandy Warnow

  2. Phylogeny From the Tree of the Life Website,University of Arizona Orangutan Human Gorilla Chimpanzee

  3. -3 mil yrs AAGACTT AAGACTT -2 mil yrs AAGGCCT AAGGCCT AAGGCCT AAGGCCT TGGACTT TGGACTT TGGACTT TGGACTT -1 mil yrs AGGGCAT TAGCCCT AGCACTT AGGGCAT AGGGCAT AGGGCAT TAGCCCT TAGCCCT TAGCCCT AGCACTT AGCACTT AGCACTT today AGGGCAT TAGCCCA TAGACTT AGCACAA AGCGCTT AGGGCAT TAGCCCA TAGACTT AGCACAA AGCGCTT DNA Sequence Evolution

  4. Molecular Phylogenetics U V W X Y AGGGCAT TAGCCCA TAGACTT TGCACAA TGCGCTT X U Y (Tree is unrooted) V W

  5. Evolutionary trees and the pharmaceutical industry • Big genome sequencing projects just produce data -- so what? Evolutionary history relates all organisms and genes, and evolutionary trees are used to make important biological discoveries. • The pharmaceutical industry uses phylogenies for many applications, such as the development of influenza vaccine! • Inaccuracies in the phylogenies lead to inaccurate predictions (e.g., vaccines that don’t work, drugs that don’t have the required properties). Current software isn’t accurate enough, or fast enough! • This means $$$!

  6. We are world leaders in research in Computational Phylogenetics • “DCM-boosting” for phylogeny reconstruction - improves accuracy and speeds up heuristics for NP-hard problems (Warnow, UT-Austin) • GRAPPA -- software for whole genome phylogeny (Moret, UNM) • Visualization of large trees, and sets of trees (Amenta, UC Davis) • Phylogenetic databases (Miranker)

  7. DCM-boosting phylogenetic reconstruction methods[Nakhleh et al. ISMB 2001] • DCM-boosting makes fast methods more accurate • DCM-boosting speeds-up heuristics for hard optimization problems 0.8 NJ DCM-NJ 0.6 Error Rate 0.4 0.2 0 0 400 800 1200 1600 No. Taxa

  8. A C A D X E Y B E Z W C F B D F Whole-Genome Phylogenetics

  9. Benchmark gene order dataset: Campanulaceae • 12 genomes + 1 outgroup (Tobacco), 105 gene segments • NP-hard optimization problems: breakpoint and inversion phylogenies 1997: BPAnalysis (Blanchette and Sankoff): 200 years (est.)

  10. Benchmark gene order dataset: Campanulaceae • 12 genomes + 1 outgroup (Tobacco), 105 gene segments • NP-hard optimization problems: breakpoint and inversion phylogenies 1997: BPAnalysis (Blanchette and Sankoff): 200 years (est.) 2000: Using GRAPPA v1.1 on the 512-processor Los Lobos Supercluster machine: 2 minutes (200,000-fold speedup per processor)

  11. Benchmark gene order dataset: Campanulaceae • 12 genomes + 1 outgroup (Tobacco), 105 gene segments • NP-hard optimization problems: breakpoint and inversion phylogenies 1997: BPAnalysis (Blanchette and Sankoff): 200 years (est.) 2000: Using GRAPPA v1.1 on the 512-processor Los Lobos Supercluster machine: 2 minutes (200,000-fold speedup per processor) 2003: Using latest version of GRAPPA: 2 minutes on a single processor (1-billion-fold speedup per processor)

  12. GRAPPA (Genome Rearrangement Analysis under Parsimony and other Phylogenetic Algorithms) http://www.cs.unm.edu/~moret/GRAPPA/ • Heuristics for NP-hard optimization problems • Fast polynomial time distance-based methods • Contributors: U. New Mexico,U. Texas at Austin, Universitá di Bologna, Italy • Fastest and most accurate software for whole genome phylogeny worldwide

  13. Opportunities • New phylogenetic reconstruction software can improve pharmaceutical R&D (making more accurate solutions achievable in hours or days, rather than months or years) • Software for researchers is available as free (open source), but users need the latest tools now, with proper interfaces -- business opportunity.

  14. Participants and Funding • University of Texas Computer Scientists: Warnow, Dhillon, Hunt, and Miranker • University of Texas biologists: Jansen, Linder, and Hillis • Other institutions: UNM, UC Davis, Central Washington, CUNY, JGI • Funding: Three NSF ITR grants, NSF Biocomplexity, David and Lucile Packard Foundation

  15. Phylolab, U. Texas Please visit us at http://www.cs.utexas.edu/users/phylo/

More Related