1 / 22

Research Data Introduction

Research Data Introduction. Mark Scott, Richard Boardman, Philippa Reed and Simon Cox Microsoft Institute for HPC and Engineering Materials Group. Talk Outline. Five Ways to Think About Research Data Case Studies Data Management. Five Ways to Think About Research Data.

goro
Download Presentation

Research Data Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Research Data Introduction Mark Scott, Richard Boardman, Philippa Reed and Simon Cox Microsoft Institute for HPC and Engineering Materials Group

  2. Talk Outline • Five Ways to Think About Research Data • Case Studies • Data Management

  3. Five Ways to Think About Research Data • How it is created • Forms of research • Electronic representation of research • Size of datasets • The data life cycle

  4. 1. Research Data Creation • Scientific experiment • Models or simulation • Observation • Derived data • Reference data

  5. 2. Forms of Research Notebooks and diaries Questionnaires, transcripts and codebooks Audiotapes and videotapes Photographs and films Specimens, samples and artefacts Methodologies, workflows, procedures and protocols Experimental results Metadata (data describing data)

  6. 3. Electronic Storage of Research Data Textual Numerical Multimedia Structured Software code Software specific Discipline specific Instrument specific Text files, Microsoft Word, PDF, RTF Excel, CSV TIFF image, AVI movie, MP3 audio CSV, database, multi-purpose (XML) Java, C, Matlab 3D CAD, statistical model Chemistry’s CIF (for crystallography) Archaeology’s laser scanner files

  7. 4. Size of Electronic Datasets • Individual large file • Set of small files collectively large • Set of small files collectively small • Individual small file • Combinations of the above • Subjective Raw CT data; movie Individual frames of movie Source code files Photograph

  8. 5. Data Life Cycle Categories Stages

  9. Case Studies

  10. Human Genetics Case Study

  11. Materials Engineering Case Study

  12. Aerodynamics Case Study

  13. Chemistry Case Study

  14. Archaeology Case Study

  15. Data Management Best Practices

  16. How do you find your data again? • Choose sensible file names.Include: • something meaningful to you (what are you doing) • something meaningful to someone else (experiment number or project name)

  17. How do you find your data again? • Use a sensible folder structure (one big-flat folder versus hierarchical tree) • Use file metadata(tagging) • Consider keeping a record of your data sets in a • Spreadsheet or • Database or • In a logbook

  18. Protecting Your Data • Backup regularly – offsite! • Follow a process that allows you to cope with versions of files (or use specialist software such as Mercurial) • Link to a publication or suitable write-up if there is one, to help others understand the data • Upload to a discipline-specific data repositories if available

  19. Protecting Your Data Can you still access your file in 20 years? • Try to use text files. Formatting might be lost but the data will be useful. • So, consider exporting to CSV, XML, free-form text • Otherwise, use file formats with openly published specifications to provide some protection: • DOCX or ODT for textual data • XLSX or ODS for spreadsheet data • SVG for figures (an open-standard vector format) • PDF/A for PDFs (a standardised version of PDF)

  20. Summary • Ways of thinking about research data: • Its source • Forms of research • Electronic research data • Data volume • Data life cycle • Case studies illustrating these categories • Manage your data and consider the long-term view • More information in the accompanying guide

  21. Acknowledgements • The categorisation of research data collection was defined in Research Information Network (2008) • The forms of research data and categorisation of electronic storage of research data was adapted from The University of Edinburgh (2011). • The following people helped with the preparation of this document: • Andy Collins (Human Genetics case study). • Thomas Mbuya and Kath Soady (Materials Fatigue Test case study). • Gregory Jasion (CFD case study). • Simon Coles (Chemistry case study). • Graeme Earl (Archaeology case study). • Mark Scott, Richard Boardman, Philippa Reed and Simon Cox (overall content). • We acknowledge ongoing support from the University of Southampton, Robert’s funding, Microsoft, EPSRC, BBSRC, JISC, AHRC and MRC.

  22. References • Digital Curation Centre (2010), ‘DCC Curation Lifecycle Model’. URL: http://www.dcc.ac.uk/resources/curation-lifecycle-model • Humphrey, C. (2006), ‘e-Science and the Life Cycle of Research’. URL: http://datalib.library.ualberta.ca/ humphrey/lifecycle-science060308.doc • Research Information Network (2008), ‘Stewardship of digital research data: a framework of principles and guidelines’. • The University of Edinburgh (2011), ‘Defining research data’. URL: http://www.ed.ac.uk/schools-departments/information-services/services/research- support/data-library/research-data-mgmt/data-mgmt/research-data-definition • University of York (2012), ‘Archaeology Data Service’. URL: http://archaeologydataservice.ac.uk/

More Related