230 likes | 321 Views
e-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins. Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat. Newcastle University. Outline. Computational challenges of bioinformatics Secretion in Bacillus Classification and analysis workflows
E N D
e-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle University
Outline • Computational challenges of bioinformatics • Secretion in Bacillus • Classification and analysis workflows • Results and discussion
Computational Challenges of Bioinformatics • New requirements from bioinformatics • 3 major problems • Heterogeneity • Distribution • Autonomy • Experiments - series of workflows
SOAPLAB Web Service Any Application Web Service e.g. DDBJ BLAST myGrid and Taverna Freefluo Freefluo Workflow engine to run workflows Scufl Simple Conceptual Unified Flow Language Taverna Writing, running workflows & examining results SOAPLAB Makes applications available
Microbase • Grid-based system for microbial genome comparison and analysis • Information repository (and execution environment) • Pre-computed data
Outline • Computational challenges of bioinformatics • Secretion in Bacillus • Classification and analysis workflows • Results and discussion
Predict characteristics & behavior of bacteria Identify secreted proteins Bacillus species diverse behaviour Soil inhabitants Harmful bacteria Secretion in Bacillus
Importance of Secretion • Mechanism of interaction with environment • Reveal capabilities of an organism • Pathogens are of great interest
Secretory Proteins Signal Peptide Cytoplasm Membrane Cell Wall Medium Transmembrane Lipoprotein Cell wall binding LPXTG
Outline • Computational challenges of bioinformatics • Secretion in Bacillus • Classification and analysis workflows • Results and discussion
Bioinformatic Tools Signalp Signal Peptide Cytoplasm Membrane TMHMM tmap MEMSAT LipoP Cell Wall ps_scan Medium Transmembrane Lipoprotein Cell wall binding LPXTG
Process of Analysis Putative secreted proteins Protein families Relations Functional classification
Custom-designed database Architecture • Provenance tracking • Analysis – computationally intensive • Architecture differs from other systems
Outline • Computational challenges of bioinformatics • Secretion in Bacillus • Classification and analysis workflows • Results and discussion
Functions of the Clusters Number of families
Biologist’s Outlook • Results available for subsequent analysis • Data and results are of great interest
eScientist’s Outlook • Microbase simplified data analysis But … • Autonomy - most services provided originally by external parties • Licensing– limits exposure of services • Distribution - difficulty came from the relatively large datasets
Future Enhancements • Use notification to automatically analyse recently annotated genomes • Migrate workflows to a remote enclosed environment?
Phillip Lord Colin Harwood Anil Wipat myGrid Carole Goble Tom Oinn … and the rest of the myGrid team Microbase Yudong Sun Anil Wipat Matthew Pocock Pete A. Lee Paul Watson Keith Flanagan James T. Worthington Acknowledgments