1 / 30

Designing an IT infrastructure for data-intensive collaborative - omics projects

This presentation discusses the principles and challenges of designing an IT infrastructure for data-intensive collaborative -omics projects. It also introduces a software suite for cross-disciplinary collaborative studies and highlights its results and conclusions.

sherriv
Download Presentation

Designing an IT infrastructure for data-intensive collaborative - omics projects

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Designing an IT infrastructure for data-intensive collaborative -omics projects Stathis Kanterakis kanterae@ebi.ac.uk European Bioinformatics Institute Cambridge, UK ICTA 2011

  2. Outline • Introduction • Why design at all? • Principles of collaborative design • A software suite for cross-disciplinary collaborative studies • Results • Conclusions

  3. Introduction

  4. The “central dogma” of information flow in molecular biology Transcription (RNA Synthesis) Translation (Protein Synthesis) DNARNAProtein Replication (DNA Synthesis) Source: http://www.rsc.org/chemistryworld/Issues/2009/November/BiologysNobelMoleculeFactory.asp

  5. The -omics cascade What CAN happen GENOMICS What APPEARS to happen TRANSCRIPTOMICS What MAKES it happen PROTEOME What HAS happened METABOLOME PHENOTYPE Source: Systems Biology and the Omics Cascade, KarolinskaInstitutet, June 9-13, 2008

  6. http://xkcd.com/793/

  7. 3B Size of human genome in bases 330 Genomes sequenced to date2 407 -omes and -omics terms1 30k Interdisciplinary bachelors degrees awarded in 2005 in USA4 $10k Cost to sequence a single human3 Sources: 1 http://omics.org/index.php/Alphabetically_ordered_list_of_omes_and_omics 2 http://www.ensemblgenomes.org/ 3 http://www.genome.gov/sequencingcosts/ 4 http://en.wikipedia.org/wiki/Interdisciplinarity

  8. Challenges in -omics research • Expensive studies • Small number of replicates (n) (microarrays, subjects...) • Large number of variables (genes, proteins, etc) • This results in: • Inflated type I error (false positives) • Poor statistical Power (true positives)

  9. Why design at all? http://xkcd.com/970/

  10. Volume vs Complexity cost model volume Growth of complexity is slower than volume V Both volume and complexity grow fast complexity C~ data types*user roles*scripts Maria Krestyaninova, 2009

  11. OmevsOmics $3,000,000,000 Cost $10,000 ~$0 $50,000 per person Ome and Omics Balance point 2010 2003 2016 Source: http://omics.org/index.php/File:Ome_versus_omics_graph_by_Jong_Bhak_openfree.gif

  12. Reporting requirements for publication Bioconductor DataShaper, OBO ISATAB, MAGETAB, MIBBI

  13. Nobody wants a cellphone that makes calls! Make your application: • Contextualized • Usable • Enjoyable • Visible (increases reputation) • Sociable • Valuable • Explorable • Flexible • In a participatory way • …

  14. OPEN-SOURCE collaborative design

  15. Maxims of the post-information era • “If the news is important, it will find me” • “Information wants to be free” • “Its not information overload, its filter failure” • “The people formerly known as the audience” • “The sources go direct” • and finally… Source: http://markcoddington.com/2010/01/30/a-quick-guide-to-the-maxims-of-new-media/

  16. “Do what you do best, link the rest” http://xkcd.com/974/

  17. Agile development Individuals & interactions over processes and tools Working software over comprehensive documentation Customer collaboration over contract negotiation Responding to change over following a plan • In practice: frequent iterations over customer feedback, trust

  18. Metadesign Courtesy of Massimo Menichinelli http://www.openp2pdesign.org/

  19. SIMBioMS Software for cross-disciplinary collaborative studies

  20. dynamic storage • project hosting • fast exchange support for collaborative discovery The big picture ISA stand alone researchers OBIBA CENTRAL DATA ARCHIVES SIMBIOMS large consortia • permanent deposition • large volumes • open access QURETEC etc. knowledge access and sustainability METABAR Maria Krestyaninova, 2009

  21. System overview DATA PROVIDERS Sample DB Biobanks submission Experiment DB Public Index submission -omics USERS controlled access open access Maria Krestyaninova, 2009

  22. Current infrastructural volume • 12 installations in 3 countries • 100 user-organisations • >50.000 samples • >50.000 assays and studies • 4 large federated R&D projects across Europe and Russia Krestyaninova et al, Bioinformatics, 2009 Viksna et al, BMC Bioinformatics, 2007

  23. SIMBIOMS in collaborative biomedical research initiatives

  24. Anton Enright, 2011

  25. Conclusions

  26. Complex interactions FDA • Who has a say in knowledge extracted from information? • Research subjects • Consent to particular research being conducted • Scientists • Protective of vision about their data • Funding sources • Expect publications from grantees Pharma academia industry big data Ministry of Health Ministry of Education Research Institutions BioBanks state YuliaTammisto, 2011

  27. Complex software • TIME is the scarcest resource • Software adoption due to: • Requirements  • No other way to do things  • Usefulness  • Use = 1 – Reuse

  28. One goal Search for the truth

  29. Thank you! Acknowledgements: • Maria Krestyaninova • UgisSarkans • Anton Enright • Mat Davis • YuliaTammisto • Massimo Menichinelli • TeemuPerheentupa • JaniHeikkinen • BalajiRajashekar • RaivoKolde • JaakVilo Uniquer www.simbioms.org

More Related