280 likes | 445 Views
SysMO-DB: A pragmatic approach to sharing information amongst Systems Biology projects in Europe http://www.sysmo-db.org. Carole Goble, University of Manchester, UK. Pan European collaboration. Systems Biology of Microorganisms.
E N D
SysMO-DB: A pragmatic approach to sharing information amongst Systems Biology projects in Europehttp://www.sysmo-db.org Carole Goble, University of Manchester, UK
Pan European collaboration. • Systems Biology of Microorganisms. • The transition from growing to non-growing Bacillus subtilis cells • Energy and Saccharomyces cerevisiae • Biology of Clostridium acetobutylicum • Gene interaction networks and models of cation homeostasis in Saccharomyces cerevisiae http://www.sysmo.net
Eleven individual projects, 91 institutes • Different research outcomes • A cross-section of microorganisms, incl. bacteria, archaea and yeast. • Record and describe the dynamic molecular processes occurring in microorganisms in a comprehensive way • Present these processes in the form of computerized mathematical models. • Pool research capacities and know-how. • Running since April 2007. • Two phases – more later! http://www.sysmo.net BaCell-SysMO COSMIC SUMO KOSMOBAC SysMO-LAB PSYSMO Valla MOSES TRANSLUCENT STREAM SulfoSYS
Types of stuff Multiple ‘omics genomics, transcriptomics proteomics, metabolomics Images Reaction Kinetics Models Relationships between data sets/experiments Procedures, experiments, data, results and models Analysis of data The same across many Systems Biology projects
The Problem (1) No one concept of experimentation or modelling No planned, shared infrastructure for pooling
Started July 2008, 3 years + 3 years 4 people, 3 teams over 3 sites Sensitively retrofit a data access, model handling and data integration platform. Support and manage the diversity of data, models and competencies. Web-based solution: exchange of data, models and processes. search for across the initiative‘s assets. dissemination of results. DB SysMO-DB
The Problem (2) Own data solutions and collaboration environments. wikis, e-Groupware, PHProjekt, BaseCamp, PLONE, Alfresco, bespoke commercial … files and spreadsheets. Own solutions Suspicion Data issues Resource Issues Suspicion and caution over sharing. Interesting interplay between modellers, experimentalists and bioinformaticians. • Many do not have data, or follow the standards that exist or know who is doing what. • Much of the data cannot be compared • Different organisms, different strains. No extra resources for the consortiums 91 institutes, 11 consortiums, some overlapping
Principles… • A series of small victories • Realistic • Don‘t reinvent • Sustainable and extensible • Migrate to community standards • Provide instant gratification • Address doubt and anxiety • Keep barriers low.
Social Approach • PALS - Power Contributors! • 18 Postdocs and PhD students • All three kinds of people • Design and technical collaboration team • Very intense collaboration • UK and Continental PALS Chapters • Audits and Sharing • Methods, data, models, standards, software, schemas, spreadsheets, SOPs….. • 20 questions want answered • Summer Schools
Picking Pain Points. Keeping it Real. • Project Directors • Data remains with us. • We control who sees what. • Just enough exchange. • Responsibility • PALs • Spreadsheets. • Yellow Pages. • Standard Operating Procedures.
SysMO SEEK Assets Catalogue. Archive. Social Network. Sharing Space. Gateway. Yellow Pages People. Expertise. Projects. Institutions. Facilities. Studies. Data Experimental data sets and analysed results. Gateway to data stores – SABIO-RK, ‘omics Models Store. Stimulate. Publish. Curate. Gateway to COPASI, JWS Online, BioModels Processes Laboratory protocols – Standard Operating Procedures Bioinformatics analyses – computational workflows - Taverna Model population and validation – workflows – Taverna Gateway to myExperiment, MolMeth, OpenWetWare…. Interlinking ASSETS CATALOGUE
SysMO SEEK Is there any group generating kinetic data? Is this data available? Who is working with which organism? What methods are been used to determine enzyme activity? Under which experimental conditions are my partners working on for the measurement of glucose concentration? ? ? ? ?
Protect: Just Enough Sharing Access Permissions Reusing myExperiment
Reward and Provenance Attribution Credit Reusing myExperiment
Just Enough Results Model • Harvest • standards e.g. MIAME (MIBBI.org) • consortium schemas and spreadsheets • JERMs for each data type – microarray, metabolomics, proteomics • Map to projects • Distribute as spreadsheet templates “I only want to collect and share just enough results”
Keeping data safe at home JERM Project X Harvester harvest Register Extractor Fetch Search Assets Catalogue Content Management System
Quality of Data – Reliable InterpretationPublication standards by stealth Controlled vocabulary plug in BioPortal
Observations - PALs • Dissemination of standards • Debunking myths • Tools exchange • Modeller – Experimentalist Trust • Like, talking together • Transcended the projects • Project power politics • PALs did their jobs….
Observations - Sharing • Methods sharing. • Protective of models. • in progress vs published models. • Access and Version management. • Curator-Rival conflict • Reluctant to share data. • Even within their own projects. • Legacy spreadsheets dominate. • Curation practices vary. • Centralised archive take-up. • Point to Point Exchange. Nature461, 145 (10 Sept09)
SysMO2 Musical Chairs • Incentive Model for Sharing • Future Funding • Phase 2 - SysMO2 • Projects dropped and added • People dropped and added • Institutions dropped and added • Others reconstituted and added • Incentive Model for Sharing? • Convenience, Added Value? • Personal benefit? Consortium Policies?
A Platform for Systems Biology Exchange • Preservation and archiving. • Widen Participation of mothership • Community Exchange Bazaar • Widen adoption of platform and enable exchange. • Accelerant to standards • Adoption of JERM. • Curation tools • CMS + JERM bundling • Widen access to External Resources, incl. publication • Added value and convenience • Preparation for publishing. EMBL- EBI ‘omics datasets Public Model repositories isatab sbml
Research Objects and e-Laboratories Packaged Assets Workflows linked to models linked to data linked to SOPs Community standards Mixed resources External and central Trust Spreadsheets Integration via RDF linked data. myExperiment, MethodBox, NEMA, BioCatalogue
Summaryhttp://www.sysmo-db.org • Reality is messy. • Extreme Technology Determinism vs Voluntarist Sociocultural shaping • Extreme and continuous partnership with users. • Act Local Think Global • Agile development environment facilitated stream of features to tackle pain points. • Leverage other e-Laboratories, • Maintaining scientists’ buy-in. • Socio-Political Axis dominates the Technical Axis. • Collaboration evolutions. Confidence in exchange • Consortium Policies.
SysMO-DB Team EML Research gGmbH, Germany Sergejs Aleksejevs Wolfgang Müller Carole Goble Isabel Rojas Olga Krebs Katy Wolstencroft University of Manchester, UK Finn Bacall Stuart Owen Jacky Snoep University of Stellenbosch, South Africa University of Manchester, UK
Acknowledgements myExperiment: http://www.myexperiment.org Taverna: http://www.mygrid.org.uk JWS Online: http://jjj.biochem.sun.ac.za/ SABIO-RK: http://sabio.villa-bosch.de/