Christian Arnold Bioinformatics Group, University of Leipzig Bioinformatics Herbstseminar

Three Weeks of Experience at the formatics Institute Christian Arnold Bioinformatics Group, University of Leipzig Bioinformatics Herbstseminar October 23th, 2009

Content • The 10kTrees Project • Phylogenetic Targeting • Acknowledgements

1. The 10kTrees Project

Goals • Updated primate phylogeny that includes phylogenetic uncertainty • Use newest available sequence data, include as much primate species as possible, and update regularly • Produce a set of >=10,000 primate-wide trees (with branch lengths) that are appropriate for taxonomically broad comparative research on primate behavior, ecology and morphology using Bayesian methods • Make it accessible to other researchers

Methodology

Version 1 vs. Version 2

Preliminary consensus tree Green: Cercopithecines Blue: Hominoids Red: Platyrrhines Yellow: Tarsiers Brown: Strepsirrhines Rooted with Galeopterus variegatus

The 10kTrees Website http://10ktrees.fas.harvard.edu/

Current Progress • Submitted to Evolutionary Anthropology, in press. • Will be presented at the AAPA conference (April 2010) in Albuquerque, New Mexico • Version 2 is almost finished • Available at http://10kTrees.fas.harvard.edu

Summary • Bayesian approach is time-consuming, but works well, even though data matrix is very sparse • Increased number of sequences in Version 2 dramatically reduces need for constraints and improves quality of tree and branch lengths estimates • Ongoing project • Total number of downloaded trees since June 2009: 95800

2. Phylogenetic Targeting

Which species should we study?

? Goals For which species should we collect data in order to increase the size of comparative data sets ?

Example 1/2 • Hypothesis: Two characters (x and y) show correlated evolution • Goal: Test this hypothesis comparatively (e.g. by using phylogenetically independent contrasts and correlation tests) • Problem 1: Data has been only collected for x, but not for y • Solution 1: Collect data for y and test hypothesis • Problem 2: From which species should we collect data for y? • Solution 2: Phylogenetic targeting!?

Example 2/2 Brain size Cognitive data 4 ? 9 7 10 ? 3 ? 2 ? Collecting new data is time-consuming and expensive…

Methods • Systematically generate all possible pairwise comparisons • For every pairwise comparison, calculate character differences for the two species that form the pair and assign a score • Determine set of phylogenetically independent pairs that maximizes the sum of all selected pair scores (maximal pairing)

Maximal pairing: Example

Decomposition of the maximal pairing Time complexity: , for balanced trees:

Simulation results 1/2 Detecting correlated character evolution, based on selection of 12 species • Random (Rnd) selection of species • Type 1 errors close to nominal level • Power: ~40%, independent of number of taxa • Uses 67% of available variation • Phylogenetic targeting (PT) induced selection of species • Type 1 errors close to nominal level • Power: 67-81%, increases with number of taxa • Uses 89% of available variation

Simulation results 2/2 Fraction of available variation after sampling 12 18 24 Number of selected species PT Rnd PT Rnd PT Rnd PT Rnd

Current Progress • A revised version will be resubmitted to American Naturalist in the not too distant future • TODO: Extend simulations and clarify some issues • Available at http://phylotargeting.fas.harvard.edu

Summary • A focused selection of species can save valuable time and money • Phylogenetic targeting provides a very flexible approach and can address different questions in the context of limited resources • Dynamic programming algorithms are everywhere

3. Acknowledgements

Thanks! • Harvard University • Max-Planck Institute for Evolutionary Anthropology • University of Leipzig • Charlie Nunn • Luke Matthews • Peter F. Stadler

Any Questions? Thank you for your attention! Questions? If not: Cheers (it’s early, but not too early…)

Christian Arnold Bioinformatics Group, University of Leipzig Bioinformatics Herbstseminar

Christian Arnold Bioinformatics Group, University of Leipzig Bioinformatics Herbstseminar

Presentation Transcript

Bioinformatics

Bioinformatics:

Bioinformatics Group Institute of Biotechnology University of Helsinki

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

BHRC: bioinformatics technology group

Cornell University Bioinformatics Facility

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics group AgroBioInstitute

Bioinformatics

Bioinformatics