150 likes | 167 Views
Learn about new scalable coalescent-based methods for species tree estimation: BBCA, ASTRAL, and ASTRID. These methods provide accurate analysis of large datasets with high efficiency and statistical consistency in the presence of ILS.
E N D
New Scalable Coalescent-Based Species Tree Estimation Methods: BBCA, ASTRAL, and ASTRID Tandy Warnow The University of Illinois
BBCA, ASTRAL, and ASTRID • BBCA is a simple way of making *BEAST scalable to large numbers of genes (but doesn’t address large numbers of species) • ASTRAL and ASTRID: • summary methods that are statistically consistent in the presence of ILS, and that run in polynomial time. • Both can analyze very large datasets (1000 species and 1000 genes – or more) with high accuracy. • The relative accuracy depends on the model condition – sometimes ASTRAL is better, sometimes ASTRID is better.
Main competing approaches gene 1gene 2 . . . gene k . . . Analyze separately . . . Summary Method Species Concatenation
Incomplete Lineage Sorting (ILS) is a dominant cause of gene tree heterogeneity
*BEAST • Heled and Drummond, MBE 2010 • Input: set of multiple sequence alignments for collection of genes • Techique: Uses MCMC to co-estimate gene trees and species trees • Highly accurate • Limited in practice to small numbers of genes and species, due to convergence issues
BBCA: improving *BEAST Zimmermann, Mirarab, and Warnow, BMC Genomics 2014: • Randomly partition genes into bins of at most 25 genes • Apply *BEAST to each bin, and take the gene trees it computes • Apply favored summary method to the gene trees • Matches accuracy of *BEAST • Improves scalability to large # genes
ASTRAL • Mirarab and Warnow, Bioinformatics 2014 • https://github.com/smirarab/ASTRAL Tutorial in Species Tree Workshop
ASTRID • ASTRID: Accurate species trees using internode distances, Vachaspati and Warnow, RECOMB-CG 2015 and BMC Genomics 2015 • Algorithmic design: Computes a matrix of average leaf-to-leaf topological distances, and then computes a tree using FastME (more accurate than neighbor Joining and faster, too). • Related to NJst (Liu and Yu, 2010), which computes the same matrix but then computes the tree using neighbor joining (NJ). • Statistically consistent under the MSC • O(kn2 + n3) time where there are k gene trees and n species
BBCA, ASTRAL, and ASTRID • BBCA is a simple way of making *BEAST scalable to large numbers of genes (but doesn’t address large numbers of species) • ASTRAL and ASTRID: • summary methods that are statistically consistent in the presence of ILS, and that run in polynomial time. • Both can analyze very large datasets (1000 species and 1000 genes – or more) with high accuracy. • The relative accuracy depends on the model condition – sometimes ASTRAL is better, sometimes ASTRID is better.
Acknowledgments Software ASTRAL: Available at https://github.com/smirarab ASTRID: Available at https://github.com/pranjalv123 Others at http://tandy.cs.illinois.edu/software.html NSF grant DBI-1461364 (joint with Noah Rosenberg at Stanford and LuayNakhleh at Rice): http://tandy.cs.illinois.edu/PhylogenomicsProject.html NSF graduate fellowship to PranjalVachaspati HHMI graduate fellowship to Siavash Mirarab Papers available at http://tandy.cs.illinois.edu/papers.html