210 likes | 355 Views
Phylogeny: Consensus Trees. Alex Wipf. Consensus Trees: Context. Phylogeny is hard Biologists don’t often agree on phylogeny Despite disagreement, focus on areas of agreement Can help direct future research
E N D
Phylogeny:Consensus Trees Alex Wipf
Consensus Trees: Context • Phylogeny is hard • Biologists don’t often agree on phylogeny • Despite disagreement, focus on areas of agreement • Can help direct future research • As long as input trees have merit, we can (hopefully) find areas that match history, even if we can’t find the whole tree • Goal of consensus: aggregate other studies’ results into one tree that captures agreement
Consensus Trees: Intro • Big Cats example • Bioinformatics For BiologistsPevzner et al. • Difficulties with Big Cats and Phylogeny • Input trees • Differences in data, processes, and results • Algorithm for reaching consensus • Majority
Big Cats: Snow Leopard, Tiger, Lion, Leopard, Jaguar, Clouded Leopard
Big Cats: Why Phylogeny is Hard • Poor fossil record • Fossil record, if good, can help trace roots • Rapid radiation during Pliocene • Speciation events took place within a short timeframe • Recent Speciation events • <1m years • Interbreeding after divergence Consequence: Disagreement on Phylogeny for the Big Cats • 14 different studies, 14 different trees • Four used in this example
Consensus Goal • Majority vs Strict
Reaching Consensus: Step 1 Finding Bipartitions
Bipartitions • We want to split each tree in two • Each half are possible associations • Represent as bitstrings • 1 = left half • 0 = right half • Order of taxa doesn’t matter, only separation • T1has bipartitions • B1 {snow leopard, tiger | jaguar, lion, leopard} • 11000 • B2 {snow leopard, tiger, jaguar | lion, leopard} • 11100
Collecting Bipartitions: T1 • Root T1 arbitrarily • Initialize bitstrings for taxa • Leaves – get bitstrings of its location in T1 • All trees use same bitstrings for leaves as T1, regardless of location differences • Bitstrings are fixed per taxa • Regardless of location, Snow Leopard is always 10000 • Put bitstrings on ancestor nodes • Non-leaf nodes – OR of children
Reaching Consensus: Step 2 Selecting Consensus Bipartitions
Finding Majority Bitstrings • Mark frequency of each unique bitstring • Book talks about sorting vs hashing • Find majority bitstrings • Ones with over 50%.
Reaching Consensus: Step 3 Constructing Consensus Trees from Consensus Bipartitions
Combining into a tree • Start with simple tree: add bitstring 11111 • Simple tree with all species from one ancestor • Preserve order of T1 • Add each majority bipartition • Convert back to unrooted tree
Adding Majority Bipartition (11000) We’re done!
What if data agreed more? • For example, let’s say we had two majority bipartitions • 11100 • 11000
Conclusion • What do we learn from the consensus tree we built? • We don’t get “the answer” • We learn what the studies agreed upon • We learn what the studies didn’t agree upon • Focus future research efforts on those areas
References Pevzner, Pavel, and Ron Shamir. Bioinformatics for Biologists. Cambridge: Cambridge UP, 2011. Print. Brian W. Davis, Gang Li, William J. Murphy, Supermatrix and species tree methods resolve phylogenetic relationships within the big cats, Panthera (Carnivora: Felidae), Molecular Phylogenetics and Evolution, 56(1):64-76, 2010 <http://www.sciencedirect.com/science/article/pii/S1055790310000473>