210 likes | 303 Views
The Deciphering of Ctenophore Genomes: How to make a fully automatic transcriptome annotation for electrophysiologists & field biologists?. David Orion Girardo Worcester Polytechnic Institute Bioinformatics and Computational Biology/ Mathematics (CS minor) Moroz Lab.
E N D
The Deciphering of Ctenophore Genomes: How to make a fully automatic transcriptome annotation for electrophysiologists & field biologists? David Orion Girardo Worcester Polytechnic Institute Bioinformatics and Computational Biology/ Mathematics (CS minor) Moroz Lab Photo courtesy of Mat Citarella
How did the Nervous System Evolve? • Still poorly understood • Applications for fundamental neuroscience & regeneration medicine
Model organism: the Ctenophore Pleurobrachiabachei • The most basally branched animal lineage with ‘true’ neurons and muscles • No identified intercellular signal molecules
Goals for a 1st year summer student in the Moroz lab: • One-click transcriptome analysis pipeline within one day • Develop secretory signalling peptide prediction system • Integratetranscriptome annotation & neuropeptide predictions in pipeline
“Automatic Genome-wide annotation pipelines are hallmarks of large sequencing centers” • Manual analysis costs a lots of Time and Money • Few centers have fully automated analysis • UF does not have this pipeline • Still manual – Costs ~$3000/few months w/out visualization
Supercomputer 1 Day Never been done
Basic Definitions • Read is any segment of DNA from the sequencing. • Contig(from contiguous) is a set of overlapping DNA segments derived from a single genetic source.
Assembly requires multiple steps Reads NewblerCap3 MIRA Contigs
Annotation Pipeline Contigs mpiBlastx Pfam Blastx NR Blastx SP Annot8r GO KEGG Database
What are Signaling Peptides? • Small secreted proteins • Effect nervous system in many different ways • Hypothesis: Older than Classical Neurotransmitters
Precursor Neuropeptide Signal peptide Internal Repeats Basic Cleavage sites No Transmembrane Domain
Implementation of the Pipeline Contigs SignalP TMHMM Need to computationally integrate all predictions Phobius TargetP Neuropred Database
Predicted Secretory Products 38 Products (most stringent criteria) Secreted Cell Guidence Molecules Secreted Proteolytic Enzyme Toxins Neuropeptides Additional 453 predicted products under less stringent criteria out of 19573
Predicted prohormones are differentially expressed % Expression Tentacles
Homologs Across Phyla Cutoff at <10-4
“2,000lines of HASKELL” Leads to…
‘0-Click’ transcriptome analysis pipeline • De Novo computational predictions of signaling secretory peptides yield good results • Neuropeptide annotation has been integrated inthe transcriptomeanalysis pipeline
Acknowledgements • Dr Leonid Moroz • Mat Citarella • Dr Andrea Kohn • Yelena Bobkova • Jim Netherton • Josh Swore • NSF, NIH