270 likes | 355 Views
SysMO-SEEK: Sharing Data and Models in Systems Biology. Katy Wolstencroft Stuart Owen Jacky Snoep University of Manchester. SysMO-DB Project. DB. A data access, model handling and data integration platform for Systems Biology: To support and manage the diversity of
E N D
SysMO-SEEK: Sharing Data and Models in Systems Biology Katy Wolstencroft Stuart Owen Jacky Snoep University of Manchester
SysMO-DB Project DB A data access, model handling and data integration platform for Systems Biology: • To support and manage the diversity of • Data, Models and experimental protocols from a consortium • Web based • Standards compliant
Pan European collaboration 13 individual projects, >100 institutes Different research outcomes A cross-section of microorganisms, incl. bacteria, archaea and yeast Record and describe the dynamic molecular processes occurring in microorganisms in a comprehensive way Present these processes in the form of computerized mathematical models Pool research capacities and know-how Already running since April 2007 Runs for 3-5 years This year, 2 new projects join and 6 leave Systems Biology of Microorganisms http://www.sysmo.net
Types of data • Multiple omics • genomics, transcriptomics • proteomics, metabolomics • fluxomics, reactomics • Images • Molecular biology • Reaction Kinetics • Models • Metabolic, gene network, kinetic • Relationships between data sets/experiments • Procedures, experiments, data, results and models • Analysis of data
Challenges Heterogeneous data and models Distributed groups of researchers Modellers and experimentalists have different skills, training, experience Scientists want to remain in control Scientists reluctant to share Social and technical challenges
SysMO-DB Dev Team Carole Goble Sergejs Aleksejevs Wolfgang Müller Heidelberg Institute for Theoretical Studies Germany Olga Krebs Katy Wolstencroft University of Manchester, UK Stuart Owen Franco du Preez Jacky Snoep University of Stellenbosch, South Africa University of Manchester, UK Finn Bacall
Social Challenge: Focus GroupSysMO PALs Show what is thereSuggest what is possible Ask for requirements Double check Transmit Disseminate Give requirements Tell priorities Rate outcomes Suggest improvements Collect answers DB team Focus Group Projects
Technical Challenge Rapid and incremental development Driven by the PALs Just enough and just in time , not Just in case No reinvention Sustainable and extensible Migrate to standards Fitting in with normal lab practices
What do we share Protocols for Models Protocol Title Authors Keywords Description Assumptions Equations Numerical Methods/Algorithms Computational Tools Parameter Estimation Techniques Limitations References + + Methods Models Data + Results All SysMO Assets
A Tree View of Assets SOP SOP SOP Investigation Studies Assay ISA infrastructure provides a directory structure for experiments http://isatab.sourceforge.net/ Construction Validation
Incentives for sharing • Safe haven for data • Credit and attribution • Help with exporting to public repositories (e.g. One-click export to ArrayExpress, PRIDE etc) • A repository for “supplementary materials” in publications • Linking publications and data • Access other resources through a SEEK gateway
Just Enough Sharing Access Permissions ...we don’t talk about security
Just Enough sharing JERM SOP SysMOLab Wiki Fetch on Request COSMIC Alfresco MOSES Wiki ANOTHER Direct Upload A DATA STORE
How do we share “Just Enough Results Model” What type of data is it Microarray, growth curve, enzyme activity… What was measured Gene expression, OD, metabolite concentration…. What do the values in the datasets mean Units, time series, repeats…. Based on: Minimum information models e.g. MIAME, MIAPE, MIRIAM Biological ontologies e.g. Gene Ontology, MGED, SBO Bioportal web service used in SysMO-SEEK for: Concept lookup and visualisation JERM
How do we share • Share JERM templates developed by SysMO-DB, PALs and consortium • Spreadsheet templates • Database Schemas • Encourage uptake throughout SysMO • transcriptomics • metabolomics • proteomics etc….
Identifying Biological Objects What do you have in your data? Proteins/enzymes, genes/expression levels, metabolites Where/how do these objects interact? Pathways, flux, experimental conditions What models describe these interactions Possible when using common frameworks, naming schemes and controlled vocabularies
Following Standards We recommend formats but we do not enforce them Protocols and SOPs – Nature Protocols Data – JERM models and community minimum information models Models – SBML and related standards Publications – PubMed and DOI If you follow the prescribed formats, you get more out, but if you don’t, you can still participate Lowering the adoption barrier
SEEK, the eLaboratory A dynamic resource for analysis as well as browsing Automatic comparison of data from inside files Understanding where and how data and models are linked Running simulations with new experimental data Running analyses and workflows over the data and models
Workflows from myExperiment • Data preparation, annotation and analysis • Systems Biology workflow Pack on myExperiment Microarray analysis and text mining Created by Afsaneh Maleki-Dizaji from SUMO, University of Sheffield Based on previous work by Paul Fisher, University of Manchester http://www.myexperiment.org/workflows/187
SEEK as a data analysis and meta analysis service • SBML model construction and population • Calibration workflow • Data requirements • Parameterised SBML model • Experimental data • Metabolite concentrations from key results database • Calibration by COPASI web service Peter Li
Data analysis and meta analysis SEEK Analysis Service with pre-cooked analysis tools. • Calibration workflow • Data requirements • Parameterised SBML model • Experimental data • Metabolite concentrations from key results database • Calibration by COPASI web service Load model: Load data: GO Peter Li
Why it works for us • A solution that fits in with current practices • Start simple, show benefits, add more • Engage with the people actually doing the work • PhD students, Post-docs • Build to the PALs requirements • Respect publication cycles • Respect cultural differences • Scientists stay in control
SysMO Methods Spreading • Virtual Liver • Mueller, via HITS • Lungsys • SBCancer • EraSysBio+ • Eukaryotic organisms • Interactions between host and pathogen • Human disease • Multi scale modelling
Acknowledgements SysMO-DB Team SysMO-PALS myGrid, Hits and JWS Online EMBL-EBI, MCISB http://www.sysmo-db.org