480 likes | 603 Views
High-throughut comparative genomics. 24th October 2013. Joe Parker, Queen Mary University London. Topics. Introduction Background: why phylo genomics ? Examples Practice Case study On the horizon Over the horizon. Aims. Context of phylogenomics: Next-generation sequencing (NGS)
E N D
High-throughut comparative genomics 24th October 2013 Joe Parker, Queen Mary University London
Topics • Introduction • Background: why phylogenomics? • Examples • Practice • Case study • On the horizon • Over the horizon
Aims • Context of phylogenomics: Next-generation sequencing (NGS) • Why phylogenomics? • Practical analyses • Future developments
Lab Interests • Ecology and evolution of traits • Echolocation, sociality • NGS data for population genetics and phylogenomics
Activities • Phylogeny estimation/comparison • Molecular correlates of evolution; • site substitutions, dN/dS, composition • Simulation • Dataset limitations (R-L): Joe Parker; GeorgiaTsagkogeorga; Kalina Davies; Steve Rossiter; Xiuguang Mao; Seb Bailey
Why phylogenomics, not -genetics? • Causes of discordant signal • Incomplete lineage sorting • Lateral transfer • Recombination • Introgression
Quantitative biology • Multiple configurations • Hyperparameters empirically investigated • Determine sensitivity of results
Distributions • Genome-scale data provides context • Identify outliers Genes / taxa / trees • Compare values across biological systems
Integration with ‘Omics • Multiple databases • Functional data • Bibliographic information
Source material • Samples • Storage • Purification • Library prep
Sequencing • Genome • Sanger • Illumina • Pyro /454 • SOLiD • PacBio • Transcriptome / RNA-seq • MyBAITS • HiSeq / MiSeq • IonTorrent
Infrastructure • Desktop machines • Computing clusters • Grid systems • Cloud-based computation
Assembly, Annotation • Assembly • To reference (mapping) • De novo • Annotation • By homology • De novo • SOAPdenovo • MAKER • Velvet • Bowtie / Cufflinks / Tophat • Trinity
Alignment • PRANK • MUSCLE • MAFFT • Clustal
Phylogeny inference • MrBayes • RAxML • BEAST • MP-EST • STAR
Phylogenetic analysis • BEAST • HYPHY • PAML • Pipelines • LRT
Parker et al. (2013) • De novo genomes: • four taxa • 2,321 protein-coding loci • 801,301 codons • Published: • 18 genomes • ~69,000 simulated datasets • ~3,500 cluster cores
mean = 0.05 mean = -0.01 mean = -0.08
Development cycle Design Alignment loadSequences() getSubstitutions() Phylogeny trimTaxa() getMRCA() Review, refine & refactor Wireframe & specify tests DataSeries calculateECDF() randomise() Regression getResiduals() predictInterval() Implement
Models of computation • Cloud resources: Unlimited flexibility, finite time • Development trade-off • Off-the-shelf • Bespoke • Exploratory work • Real time genomic transects? • Essential fundamental data missing from nearly every system; • Diversity; structure; substitution rates; dN/dS; recombination; dispersal; lateral transfer
Serialisation • Process data remotely • Freeze-dry objects, download to desktop • Implement new methods directly on previously-analysed data
7. Over the horizon • Real-time phylogenetics • Field phylogenetics • Alignment-free analyses
Conclusions • Why phylogenomics? • Practice • Comparative approach • Statistical context
Thanks Steve Rossiter1, James Cotton2, Elia Stupka3 & Georgia Tsagkogeorga1 1School of Biological and Chemical Sciences, Queen Mary, University of London 2Wellcome Trust Sanger Institute 3Center for Translational Genomics and Bioinformatics, San Raffaele Institute, Milan Chris Walker & Dan Traynor Queen Mary GridPP High-throughput Cluster Chaz Mein & Anna Terry Barts and The London Genome Centre Mahesh Pancholi School of Biological and Chemical Sciences BBSRC (UK); Queen Mary, University of London
Resources • My email: Joe Parker (Queen Mary University of London): j.d.parker@qmul.ac.uk • Parker, J., Tsagkogeorga, G., Cotton, J.A., Liu, Y., Provero, P., Stupka, E. & Rossiter, S.J. (2013) Genome-wide signatures of convergent evolution in echolocating mammals. Nature502(7470):228-231 doi:10.1038/nature12511. • Tsagkogeorga, G., Parker, J., Stupka, E., Cotton, J.A., & Rossiter, S.J. (2013) Phylogenomic analyses elucidate evolutionary relationships of the bats (Chiroptera) Curr. Biol. in the press. • Salichos, L. & Rokas, A. (2013) Inferring ancient divergences requires genes with strong phylogenetic signals. Nature437:327-331. doi:10.1038/nature12130 • Backström, N., Zhang, Q. & Edwards, S.V. (2013) Evidence from a House Finch (Haemorhous mexicanus) Spleen Transcriptome for Adaptive Evolution and Biased Gene Conversion in Passerine Birds. MBE30(5):1046-50. doi:10.1093/molbev/mst033 • Lindblad-Toh, K., Garber, M., Zuk, O., Lin, M.F., Parker, B.J., et al. (2011) A high-resolution map of human evolutionary constraint using 29 mammals. Nature478:476–482 doi:10.1038/nature10530 • Degnan, J.H. & Rosenberg, N.A. (2009) Gene tree discordance, phylogenetic inference and the multispecies coalescent. TREE24:(6)332-340 doi:10.1016/j.tree.2009.01.009 • The Tree Of Life: http://phylogenomics.blogspot.co.uk/ • RNA-seq For Everyone: http://rnaseq.uoregon.edu/index.html • Evo-Phylo: http://www.davelunt.net/evophylo/tag/phylogenomics/ • OpenHelix: http://blog.openhelix.eu/ • Our blogs: http://evolve.sbcs.qmul.ac.uk/rossiter/ (lab) and http://www.lonelyjoeparker.com/?cat=11 (Joe)