320 likes | 496 Views
SysMO-DB: A Community-Based Approach to Data Sharing. Dr Katy Wolstencroft University of Manchester. SysMO-DB. A data access, model handling and data integration platform for Systems Biology A web based resource That promotes shared understanding
E N D
SysMO-DB: A Community-Based Approach to Data Sharing Dr Katy Wolstencroft University of Manchester
SysMO-DB A data access, model handling and data integration platform for Systems Biology A web based resource That promotes shared understanding Using a common platform and common technologies Started July 2008 DB
SysMO-DB Dev Team Carole Goble Sergejs Aleksejevs Wolfgang Müller Heidelberg Institute for Theoretical Studies Germany Olga Krebs University of Manchester, UK Katy Wolstencroft Finn Bacall Stuart Owen Jacky Snoep University of Stellenbosch, South Africa University of Manchester, UK Franco B du Preez
Pan European collaboration Eleven individual projects, 89 institutes Different research outcomes A cross-section of microorganisms, incl. bacteria, archaea and yeast Record and describe the dynamic molecular processes occurring in microorganisms in a comprehensive way Present these processes in the form of computerized mathematical models Pool research capacities and know-how Already running since April 2007 Runs for 3-5 years This year, 2 new projects join and 6 leave Systems Biology of Microorganisms http://www.sysmo.net
Challenges Heterogeneous data and models Distributed groups of researchers Modellers and experimentalists have different skills, training, experience Scientists want to remain in control Social and technical challenges
Social Challenge: Focus Group Show what is thereSuggest what is possible Ask for requirements Double check Transmit Disseminate Give requirements Tell priorities Rate outcomes Suggest improvements Collect answers DB team Focus Group Projects
Focus Group SysMO-DB PALS 21 Postdocs and PhD students Modellers, experimentalists and bioinformaticians Design and technical collaboration team Intense collaboration UK and Continental PALS Chapters Audits and Sharing. Methods, data, models, standards, software, schemas, spreadsheets, SOPs….. 20 questions Deployment into Projects
Technical Challenge Rapid and incremental development Just enough and just in time , not Just in case No reinvention Driven by the PALs Sustainable and extensible Migrate to standards Fitting in with normal lab practices
What do we share Nature Protocols Protocol Title Authors Keywords Abstract Materials Reagents Reagent Set Up Equipment Time Taken Procedure Troubleshooting Critical Steps Anticipated Results References + + Results Methods Data All SysMO Assets
What do we share Protocols for Models Protocol Title Authors Keywords Description Assumptions Equations Numerical Methods/Algorithms Computational Tools Parameter Estimation Techniques Limitations References + + Methods Models Data + Results All SysMO Assets
A Tree View of Assets SOP SOP SOP Investigation Studies Assay ISA infrastructure provides a directory structure for experiments http://isatab.sourceforge.net/ Construction Validation
Expertise, tools Coordinates, data
How do we share “Just Enough Results Model” What type of data is it Microarray, growth curve, enzyme activity… What was measured Gene expression, OD, metabolite concentration…. What do the values in the datasets mean Units, time series, repeats…. Based on: Minimum information models e.g. MIAME, MIAPE, MIRIAM Biological ontologies e.g. Gene Ontology, MGED, SBO Bioportal web service used in SysMO-SEEK for: Concept lookup and visualisation JERM
How do we share • Share JERM templates developed by SysMO-DB, PALs and consortium • Spreadsheet templates • Database Schemas • Encourage uptake throughout SysMO • transcriptomics • metabolomics • proteomics etc….
Tools to help manage data:Annotation standards by stealth Controlled vocabulary plug in BioPortal
JERM Model SysMO JERM a ‘MIBBI’ for the SysMO-SEEK What do we need to help you find stuff? Title, person, filename, class What is experiment specific? What is experiment specific, but helps us map between them? Common biological elements chemicals, genes, proteins, organisms, strains
Identifying Biological Objects What do you have in your data? Proteins/enzymes, genes/expression levels, metabolites Where/how do these objects interact? Pathways, flux, experimental conditions What models describe these interactions Possible when using common frameworks, naming schemes and controlled vocabularies
Following Standards We recommend formats but we do not enforce them Protocols and SOPs – Nature Protocols Data – JERM models and community minimum information models Models – SBML and related standards Publications – PubMed and DOI If you follow the prescribed formats, you get more out, but if you don’t, you can still participate Lowering the adoption barrier
Just Enough Sharing Access Permissions ...we don’t talk about security
Just Enough sharing JERM SOP SysMOLab Wiki Fetch on Request COSMIC Alfresco MOSES Wiki ANOTHER Direct Upload A DATA STORE
When do People Share SysMO Aims : sharing sooner • Suspicion and fear of scooping • Reputation
Incentives for sharing • Safe haven for data • Credit and attribution • Help with exporting to public repositories (e.g. One-click export to ArrayExpress, PRIDE etc) • A repository for “supplementary materials” in publications • Linking publications and data • Access other resources through a SEEK gateway
SEEK as a Gateway • JWS Online Plugin • online simulator, runs in SysMO-SEEK • upload models in SBML format • SBGN schemas, with annotations and external links
Incentives for sharing • Credit and attribution • SEEK records who owns what. If data, models, or protocols are reused, scientists get recognition • Accountability • SEEK records who owns what. If you take credit for others work, they will see Data citation – formal credit for data published in SEEK
Data Citation • Persistent identifiers and URLs for the data • Linking people to the data • Safe haven for the data • Guarantees of sustainability • Data MUST be uploaded and archived • If cited, it must be public
SEEK as a Safe Haven • HITS can archive SysMO data for 10 years • All SysMO software is open source and available • Distinction between sustaining the service and the software
Governance and Policy • What is required by SysMO members? • When should they share during their projects? • How long after the project can they keep data private to finish publications? • If their data is stored locally, what is the archive process? • Policy from DMG and funding agencies and NOT SysMO-DB
Governance and Policy • Proposals under discussion: • All data registered in SEEK should be uploaded and archived at the end of a SysMO project • All data from finished projects should be shared • How long after the end? 1 day, 6 months, 1 year? • Scientists can invoke “creator’s privilege” on SysMO assets produced near the end of the project • Extra time to write-up and publish before release to the general public – respecting publication cycles
SysMO So Far… • People ARE sharing • Over 300 assets in SEEK • SOPs: 102, Models: 17, DataFiles: 95 ,Investigations: 13, Studies: 26, Assays: 53 • PALs – a network of young SysBio researchers • Training and education in data and metadata management spreading through the consortium • Modellers and experimentalists communicating
SysMO Methods Spreading • Virtual Liver • Mueller, via HITS • Lungsys • SBCancer • EraSysBio+ • Eukaryotic organisms • Interactions between host and pathogen • Human disease • Multi scale modelling
Why it works for us • A solution that fits in with current practices • Start simple, show benefits, add more • Engage with the people actually doing the work • PhD students, Post-docs • Build to the PALs requirements • Respect publication cycles • Respect cultural differences • Scientists stay in control
Acknowledgements SysMO-DB Team SysMO-PALS myGrid, Hits and JWS Online EMBL-EBI, MCISB http://www.sysmo-db.org