220 likes | 339 Views
Research Data Introduction. Mark Scott, Richard Boardman, Philippa Reed and Simon Cox Microsoft Institute for HPC and Engineering Materials Group. Talk Outline. Five Ways to Think About Research Data Case Studies Data Management. Five Ways to Think About Research Data.
E N D
Research Data Introduction Mark Scott, Richard Boardman, Philippa Reed and Simon Cox Microsoft Institute for HPC and Engineering Materials Group
Talk Outline • Five Ways to Think About Research Data • Case Studies • Data Management
Five Ways to Think About Research Data • How it is created • Forms of research • Electronic representation of research • Size of datasets • The data life cycle
1. Research Data Creation • Scientific experiment • Models or simulation • Observation • Derived data • Reference data
2. Forms of Research Notebooks and diaries Questionnaires, transcripts and codebooks Audiotapes and videotapes Photographs and films Specimens, samples and artefacts Methodologies, workflows, procedures and protocols Experimental results Metadata (data describing data)
3. Electronic Storage of Research Data Textual Numerical Multimedia Structured Software code Software specific Discipline specific Instrument specific Text files, Microsoft Word, PDF, RTF Excel, CSV TIFF image, AVI movie, MP3 audio CSV, database, multi-purpose (XML) Java, C, Matlab 3D CAD, statistical model Chemistry’s CIF (for crystallography) Archaeology’s laser scanner files
4. Size of Electronic Datasets • Individual large file • Set of small files collectively large • Set of small files collectively small • Individual small file • Combinations of the above • Subjective Raw CT data; movie Individual frames of movie Source code files Photograph
5. Data Life Cycle Categories Stages
How do you find your data again? • Choose sensible file names.Include: • something meaningful to you (what are you doing) • something meaningful to someone else (experiment number or project name)
How do you find your data again? • Use a sensible folder structure (one big-flat folder versus hierarchical tree) • Use file metadata(tagging) • Consider keeping a record of your data sets in a • Spreadsheet or • Database or • In a logbook
Protecting Your Data • Backup regularly – offsite! • Follow a process that allows you to cope with versions of files (or use specialist software such as Mercurial) • Link to a publication or suitable write-up if there is one, to help others understand the data • Upload to a discipline-specific data repositories if available
Protecting Your Data Can you still access your file in 20 years? • Try to use text files. Formatting might be lost but the data will be useful. • So, consider exporting to CSV, XML, free-form text • Otherwise, use file formats with openly published specifications to provide some protection: • DOCX or ODT for textual data • XLSX or ODS for spreadsheet data • SVG for figures (an open-standard vector format) • PDF/A for PDFs (a standardised version of PDF)
Summary • Ways of thinking about research data: • Its source • Forms of research • Electronic research data • Data volume • Data life cycle • Case studies illustrating these categories • Manage your data and consider the long-term view • More information in the accompanying guide
Acknowledgements • The categorisation of research data collection was defined in Research Information Network (2008) • The forms of research data and categorisation of electronic storage of research data was adapted from The University of Edinburgh (2011). • The following people helped with the preparation of this document: • Andy Collins (Human Genetics case study). • Thomas Mbuya and Kath Soady (Materials Fatigue Test case study). • Gregory Jasion (CFD case study). • Simon Coles (Chemistry case study). • Graeme Earl (Archaeology case study). • Mark Scott, Richard Boardman, Philippa Reed and Simon Cox (overall content). • We acknowledge ongoing support from the University of Southampton, Robert’s funding, Microsoft, EPSRC, BBSRC, JISC, AHRC and MRC.
References • Digital Curation Centre (2010), ‘DCC Curation Lifecycle Model’. URL: http://www.dcc.ac.uk/resources/curation-lifecycle-model • Humphrey, C. (2006), ‘e-Science and the Life Cycle of Research’. URL: http://datalib.library.ualberta.ca/ humphrey/lifecycle-science060308.doc • Research Information Network (2008), ‘Stewardship of digital research data: a framework of principles and guidelines’. • The University of Edinburgh (2011), ‘Defining research data’. URL: http://www.ed.ac.uk/schools-departments/information-services/services/research- support/data-library/research-data-mgmt/data-mgmt/research-data-definition • University of York (2012), ‘Archaeology Data Service’. URL: http://archaeologydataservice.ac.uk/