1 / 19

The Evolution of Research Data: Some Thoughts, and a Few Pilots in Data Curation

The Evolution of Research Data: Some Thoughts, and a Few Pilots in Data Curation. Anita de Waard VP Research Data Collaborations a.dewaard@elsevier.com http ://researchdata.elsevier.com /. Outline:. Elsevier Research Data Services: goals and principles of this new group

nicola
Download Presentation

The Evolution of Research Data: Some Thoughts, and a Few Pilots in Data Curation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Evolution of Research Data: Some Thoughts, and a Few Pilots in Data Curation Anita de WaardVP Research Data Collaborationsa.dewaard@elsevier.com http://researchdata.elsevier.com/

  2. Outline: • Elsevier Research Data Services: goals and principles of this new group • Four aspects of research data management: • Data digitisation- to enable preservation • Data curation- to enable interoperability • Repository integration - and overlapping roles • Sustainable models in a credit economy • Next steps.

  3. Elsevier Research Data Services: Goals • Increase Data Preservation: Help increase the amount and quality of data preserved and shared • Improve Data Use: Help increase the value and usability of the data shared by increasing annotation, normalization, provenance enabling enhanced interoperability • Develop Sustainable Models: Help measure and deliver credit for shared data, the researchers, the institute, and the funding body, enabling more sustainable platforms.

  4. Elsevier RDS: Guiding Principles • In principle, all data stays open • Work with existing repositories – URLs, front end etc stay where they are • Collaboration is tailored to partner’s unique needs: • Aspects where collaboration is needed are discussed • A collaboration plan is drawn up using a Service-Level Agreement: agree on time, conditions, etc. • Working with domain-specific and institutional repositories • 2013: series of pilotsto enable feasibility study: • What are key needs? • Can Elsevier play a role: skillsets, partnerships? • Is there a (transparant) business model for this?

  5. Where The Data Goes Now: PDB: 88,3 k A small portion of data (1-2%?) stored in small, topic-focuseddata repositories PetDB: 1,5 k > 50 My Papers 2 M scientists 2 My papers/year SedDB: 0.6 k MiRB: 25k TAIR: 72,1 k Some data (8%?) stored in large, genericdata repositories Majority of data(90%?) is stored on local hard drives Dataverse:0.6 My Dryad: 7,631 files Datacite: 1.5 My

  6. 4. DEVELOP SUSTAINABLE MODELS Key Needs: PDB: 88,3 k A small portion of data (1-2%?) stored in small, topic-focuseddata repositories PetDB: 1,5 k > 50 My Papers 2 M scientists 2 My papers/year SedDB: 0.6 k 3. IMPROVE REPOSITORYINTEROPERABILITY MiRB: 25k TAIR: 72,1 k Some data (8%?) stored in large, genericdata repositories Majority of data(90%?) is stored on local hard drives Dataverse:0.6 My 2. IMPROVE DATA USABILITY Dryad: 7,631 files 1. INCREASE DATA DIGITISATION Datacite: 1.5 My

  7. 1. Data Digitisation • Goal: enable access, reproduction • Issue: much of the research data is simply not digitized! • Example: Magellan Observatory’s paper records • Example: CMUElectrophysiology Lab:lab notebooks are kept on paper

  8. 1. Data Digitisation • Pilot: CMU App-driven metadatarecorded during measurement:NB: storage is not sharing! • Example: Marine geology suggests: convince instruments, not researchers! http://www.marine-geo.org/ • Prize: Data Rescue Award:$ 5000 award for best‘dark data’ data rescue attempt The 2013 International Data Rescue Award in the Geosciences Organised by IEDA and Elsevier Research Data Services

  9. 2. Data Curation • Toallow reuse, data needs to be enriched: why and how was it created? • Issue: Dropbox and Figshare most popular tools • Example: moon rock data is stored as PDFs with tables from different papers • Pilot: lunar samples: curate geochemistry to allow data use

  10. 2. Data Curation • Issue: hard to find right antibodies in papers (NIF) • Pilot: Properly Annotated Data Sets (PADS) for biology:shared ‘cloud’ of metadata, describes why, what, how of experimental procedure:

  11. 3. An embarassment of repositories:Fine, I’ll share my data– but where do I put it? • Local Data Repository • Domain-SpecificData Repository WANT AND • Collaborators: • Domain of study: DIFFERENT • Generic Data Repository • Institutional Data Repository ALL THEY METADATA!!!! • Funding Agency: • University:

  12. 3. Repository Interoperability • Pilot: find metabolomiccompounds from mass spectrometry data: need biological understanding of chemical results • Issue: battle between domain-specific and ‘domain-agnostic’ repositories: who is a better data curator? (Example: geochemistry)

  13. 3. Repository Integration • Counterexample: MERRITT system: CDL builds generic infrastructure – domain-specific content curators • Planned report withUCL, UCSD, MIT Libraries and CNI: best practices re. role of libraries? • How does a researcher decide where to place his/her content? • How does a library decide what digital data curation/preservation efforts to invest in? • Report on criteria/examples to help weigh and assist these questions: write draft, invite comments, discuss.

  14. 4. Sustainable Models: Interdependency in a credit economy Request Funding From Scholars Policies and Evaluations Funding agencies Report back Evaluate Fund Report to University as a Digital Enterprise? Roles, responsibilities? Domain/Task-Specific Groups (EarthCnbe, DataONEetc) Institutes

  15. 4. Sustainable Models: Many Questions! • Cost: • Who pays for hosting the data? • Who pays for data curation? • Who pays for long-term preservation? • Who pays for data integration? • Infrastructure: • Where does the metadata live? • What is the entry point to metadata cloud – the paper, the data? • Who is responsible for fulfilling DMPrequirements? • Who decides on the data storage requirements? • Usage: • Who wants to know where/what data is stored? • Who needs to know how data was accessed/used? • Who gets credit for data storage, data use? • Who needs/pays for credit-metric reporting?

  16. RDS Next Steps: 2013: • Perform series of pilotsto assess scope/breadth of issues re. data digitisation, curation, repository interoperability, credit/attribution. • Coauthor/support CNI report on role of libraries with respect to research data curation. • Work with number of institutions (UCSD, OHSU, UVM, ..) on requirements for ‘university as a Digital Enterprise’ 2014: • Selection of key areas to focus on for Elsevier RDS. • Business plans and decisions.

  17. In Summary: • Data digitisation: • Multiplicity of content, how to reach ‘small data’ creators? • Working with equipment could be key to success? • Pilots with CMU, Lunar Samples, Data Rescue Award. • Data curation: • Essential for reuse, but who does the work? • Each use case/user has own metadata requirements • Pilots with Metabolomics MS, Properly Annotated Data Sets. • Repository integration: • Domain-specific vs. domain agnostic? • Various domains have different requirements • Study and report re. best-practices for libraries. • Sustainable models in a credit economy: • Cost, infrastructure, usage: who needs what? • Who pays for what? • Interviews/discussions with number of institutions.

  18. Thank you! Collaborations and discussions gratefully acknowledged with: • UCSD: Phil Bourne, Brian Shoettlander, David Minor, Declan IlyaZaslavsky • NIF: Maryann Martone, Anita Bandrowski • MSU: Brian Bothner • OHSU: Melissa Haendel, Nicole Vasilevsky • CDL: Carly Strasser, John Kunze, Stephen Abrams • Columbia/IEDA: Kerstin Lehnert, Leslie Hsu • CNI: Clifford Lynch • Harvard Dept of Astronomy: Michael Kurtz, Chris Erdmann • MIT: Micah Altman • UVM: Mara Saule http://researchdata.elsevier.com/

  19. Linking papers to research data:

More Related