130 likes | 301 Views
Making It Happen:. Making It Happen. Sustainable Data Preservation and Use. March 19, 2013 Anita de Waard VP Research Data Collaborations, Elsevier RDS a.dewaard@elsevier.com. “What aspects /tools/capabilities/frameworks are related to this idea?” .
E N D
Making ItHappen: Making It Happen Sustainable Data Preservation and Use March 19, 2013 Anita de WaardVP Research Data Collaborations, Elsevier RDS a.dewaard@elsevier.com
“What aspects/tools/capabilities/frameworks are related to this idea?” • There are many different research databases– both generic (Dryad, Dataverse, …) and specific (NIF, IEDA, PDB, …) • There are many systems for creating/sharing workflows (Taverna, MyExperiment, Vistrails, Workflow4Ever etc) • There are many e-lab notebooks (LabGuru, LabArchives, LaBlog, etc) • There are scores of projects, committees, standards, bodies, grants, initiatives, conferences for discussing and connecting all of this (KEfED, Pegasus, PROV, RDA, Science Gateways, Codata, BRDI, Earthcube, etc. etc) • You can make a living out of this ;-)! (and many of us do…)
…but this is what scientists do: Using antibodies and squishy bits Grad Students experiment and enter details into theirlab notebook. The PI then tries to make sense of this, and writes a paper. End of story.
Why save research data? • Data Preservation: • Preserve record of scientific process, provenance • Enable reproducible research • Data Use: • Use results obtained by others • Do better science! • Improve interdisciplinary work • Sustainable Models: • Technology transfer; societal/industrial development • Reward scientists for data creation (credit/attribution) • Long-term archiving
Where The Data Goes Now: PDB: 88,3 k A small portion of data (1-2%?) stored in small, topic-focuseddata repositories PetDB: 1,5 k > 50 My Papers 2 M scientists 2 M papers/year SedDB: 0.6 k MiRB: 25k TAIR: 72,1 k Some data (8%?) stored in large, genericdata repositories Majority of data(90%?) is stored on local hard drives Dataverse:0.6 M Dryad: 7,631 files Datacite: 1.5 M
DEVELOP SUSTAINABLE MODELS Key Needs: PDB: 88,3 k A small portion of data (1-2%?) stored in small, topic-focuseddata repositories PetDB: 1,5 k > 50 My Papers 2 M scientists 2 M papers/year SedDB: 0.6 k MiRB: 25k TAIR: 72,1 k IMPROVE DATA USE Some data (8%?) stored in large, genericdata repositories Majority of data(90%?) is stored on local hard drives Dataverse:0.6 M Dryad: 7,631 files INCREASE DATA PRESERVATION Datacite: 1.5 M
From insular ‘CoSI-Factories’… Prepare Prepare Ponder Ponder Observe Observe Communicate Communicate Analyze Analyze
…to shared experimental repositories: Across labs, experiments: track reagents and how they are used Prepare Prepare Observations Observations Analyze Analyze Communicate Communicate Observations
…to shared experimental repositories: Compare outcome of interactions with these entities Prepare Prepare Observations Observations Analyze Analyze Communicate Communicate Observations
…to shared experimental repositories: Build a ‘virtual reagent spectrogram’ by comparing how different entities interacted in different experiments Prepare Prepare Observations Think Observations Analyze Analyze Communicate Communicate Observations
Some examples: • Grafting tools on workflow: create tailored metadata collection tools on mini-tablets in labsto replace paper notebook • Direct rewards: through ‘PI-Dashboard’:allow immediate access/analysis of shared data: new science! • Data sharing rewards: Data Rescue Challenge:: collect and reward stories/practices of data preservation/use in Earth/Lunar Science • Improve data use: With NIF/Eagle-I: add antibodies as key ‘entities’ to paper, link to AB repository c o n s o r t i u m
How do we make data use happen: • We are creating repositories of shared experiments: you are part of a greater whole! • Collect and share stories and practices re. data use and sustainable systems: “What gets to them?” • Develop system of rewards for data sharing: enable demonstrably better science! • Work with grant agencies, repositories (generic/specific, institutional, cross-national) to integrate and annotate existing datasets and enable cross-use • Collectively pioneer long-term funding options; support/develop ‘shared mission’ funding challenges