300 likes | 312 Views
This presentation discusses the principles and challenges of designing an IT infrastructure for data-intensive collaborative -omics projects. It also introduces a software suite for cross-disciplinary collaborative studies and highlights its results and conclusions.
E N D
Designing an IT infrastructure for data-intensive collaborative -omics projects Stathis Kanterakis kanterae@ebi.ac.uk European Bioinformatics Institute Cambridge, UK ICTA 2011
Outline • Introduction • Why design at all? • Principles of collaborative design • A software suite for cross-disciplinary collaborative studies • Results • Conclusions
The “central dogma” of information flow in molecular biology Transcription (RNA Synthesis) Translation (Protein Synthesis) DNARNAProtein Replication (DNA Synthesis) Source: http://www.rsc.org/chemistryworld/Issues/2009/November/BiologysNobelMoleculeFactory.asp
The -omics cascade What CAN happen GENOMICS What APPEARS to happen TRANSCRIPTOMICS What MAKES it happen PROTEOME What HAS happened METABOLOME PHENOTYPE Source: Systems Biology and the Omics Cascade, KarolinskaInstitutet, June 9-13, 2008
3B Size of human genome in bases 330 Genomes sequenced to date2 407 -omes and -omics terms1 30k Interdisciplinary bachelors degrees awarded in 2005 in USA4 $10k Cost to sequence a single human3 Sources: 1 http://omics.org/index.php/Alphabetically_ordered_list_of_omes_and_omics 2 http://www.ensemblgenomes.org/ 3 http://www.genome.gov/sequencingcosts/ 4 http://en.wikipedia.org/wiki/Interdisciplinarity
Challenges in -omics research • Expensive studies • Small number of replicates (n) (microarrays, subjects...) • Large number of variables (genes, proteins, etc) • This results in: • Inflated type I error (false positives) • Poor statistical Power (true positives)
Why design at all? http://xkcd.com/970/
Volume vs Complexity cost model volume Growth of complexity is slower than volume V Both volume and complexity grow fast complexity C~ data types*user roles*scripts Maria Krestyaninova, 2009
OmevsOmics $3,000,000,000 Cost $10,000 ~$0 $50,000 per person Ome and Omics Balance point 2010 2003 2016 Source: http://omics.org/index.php/File:Ome_versus_omics_graph_by_Jong_Bhak_openfree.gif
Reporting requirements for publication Bioconductor DataShaper, OBO ISATAB, MAGETAB, MIBBI
Nobody wants a cellphone that makes calls! Make your application: • Contextualized • Usable • Enjoyable • Visible (increases reputation) • Sociable • Valuable • Explorable • Flexible • In a participatory way • …
Maxims of the post-information era • “If the news is important, it will find me” • “Information wants to be free” • “Its not information overload, its filter failure” • “The people formerly known as the audience” • “The sources go direct” • and finally… Source: http://markcoddington.com/2010/01/30/a-quick-guide-to-the-maxims-of-new-media/
“Do what you do best, link the rest” http://xkcd.com/974/
Agile development Individuals & interactions over processes and tools Working software over comprehensive documentation Customer collaboration over contract negotiation Responding to change over following a plan • In practice: frequent iterations over customer feedback, trust
Metadesign Courtesy of Massimo Menichinelli http://www.openp2pdesign.org/
SIMBioMS Software for cross-disciplinary collaborative studies
dynamic storage • project hosting • fast exchange support for collaborative discovery The big picture ISA stand alone researchers OBIBA CENTRAL DATA ARCHIVES SIMBIOMS large consortia • permanent deposition • large volumes • open access QURETEC etc. knowledge access and sustainability METABAR Maria Krestyaninova, 2009
System overview DATA PROVIDERS Sample DB Biobanks submission Experiment DB Public Index submission -omics USERS controlled access open access Maria Krestyaninova, 2009
Current infrastructural volume • 12 installations in 3 countries • 100 user-organisations • >50.000 samples • >50.000 assays and studies • 4 large federated R&D projects across Europe and Russia Krestyaninova et al, Bioinformatics, 2009 Viksna et al, BMC Bioinformatics, 2007
Complex interactions FDA • Who has a say in knowledge extracted from information? • Research subjects • Consent to particular research being conducted • Scientists • Protective of vision about their data • Funding sources • Expect publications from grantees Pharma academia industry big data Ministry of Health Ministry of Education Research Institutions BioBanks state YuliaTammisto, 2011
Complex software • TIME is the scarcest resource • Software adoption due to: • Requirements • No other way to do things • Usefulness • Use = 1 – Reuse
One goal Search for the truth
Thank you! Acknowledgements: • Maria Krestyaninova • UgisSarkans • Anton Enright • Mat Davis • YuliaTammisto • Massimo Menichinelli • TeemuPerheentupa • JaniHeikkinen • BalajiRajashekar • RaivoKolde • JaakVilo Uniquer www.simbioms.org