420 likes | 438 Views
Research Data Management For Researchers. Dr Joanna Goodger Information Hertfordshire With Bill Worthington, Sara Hajnassiri, and Mohamed Hansraj. Research Data Management. Lets get going. Research Data Management Decisions making.
E N D
Research Data ManagementFor Researchers Dr Joanna Goodger Information Hertfordshire With Bill Worthington, Sara Hajnassiri, and Mohamed Hansraj
Research Data Management Lets get going
Research Data ManagementDecisions making • In this module, we’ll discuss how best to set up your research: • Filing systems; naming, formats, and versioning • Metadata; what to include and how • Software; longevity and stability • Documentation; logs, instructions, and records • Coding for the future Getting Started with Research Data Management
Research Data ManagementWhy do it now? The end point of all projects involves making the data publicly available. Many data will be deposited in national archives which have regulations for files and metadata. Thinking about the requirements at the beginning of the project will limit the transformations needed at the end of the project. If your file formats have a low risk of obsolescence, is free and openly available then you’re on the path to long-lived files, but you should also consider degradation, compression, and the fidelity of your data. Getting Started with Research Data Management
Research Data Management Filing systems
Research Data ManagementFiling Systems • Filing is more than saving files, it’s making sure you can find them later in your project. • Naming • Directory Structure • File Types • Versioning • All these help to keep your data safe and accessible. My Project Getting Started with Research Data Management
Research Data ManagementActivity What is data? What does data mean to you? Spend a couple of minutes thinking about what data you will be working with, throughout your project. Then we’ll combine your ideas and compare them between disciplines. Getting Started with Research Data Management
Research Data ManagementNaming Conventions What’s in a name? Creating systematic names can be as simple as assigning a prefix or a number to each object in which case they are a type of numbering scheme. Using a naming convention means that you can distinguish similar records from one another at a glance. You can combine information to form logical file names, changing sections of it to reflect the differences between the files. Getting Started with Research Data Management
Research Data ManagementFile formats • The formats most likely to be accessible in the future are: • non-proprietary • in an open, documented standard • commonly used by the research community • in a standard representation e.g. ASCII, Unicode • unencrypted • and uncompressed Getting Started with Research Data Management
Research Data ManagementFile formats Images / Photos Code Plots Tables Audio-Visual Transcripts Getting Started with Research Data Management
Research Data ManagementFile formats Formats Uses Considerations Getting Started with Research Data Management
Research Data ManagementFile formats • Examples of preferred format choices: • PDF/A, not Word • ASCII, not Excel • MPEG-4, not QuickTime • TIFF or JPEG2000, not GIF or JPG • XML or RDF, not RDBMS • When considering the best file formats for your data, you should think about cross-platform formats and the simplest forms Getting Started with Research Data Management
Research Data ManagementFile sizes The format you choose will also affect the compression of your data and how much storage space you’re going to need to keep your data safe and accessible. Consider a 5 Megapixel image. The table below gives the size of that file in different standard formats. You can see what a difference your format makes to your storage requirements. You should think about which is best for your outputs: For the RDM website, resizing the image saves space and prevents the image becoming distorted by compression by the browser. Getting Started with Research Data Management
Research Data ManagementVersioning Keep editing under control Whether you’re working on developing software or writing a document, keeping track of changes made by you and your collaborators is a useful tool as you can check that issues have been addressed and mistakes can be undone. Some software will automatically control your versions, while others require you to ‘Save As’ for a new version – every day or every time changes are made. Cloud storage facilities such as LiveDrive and RackSpace as well as the UH Document Management System (DMS) lock documents while they are being edited so you cannot work on the same file as others preventing overwriting. Getting Started with Research Data Management
Research Data Management Metadata
Research Data ManagementData metadata • What is metadata? • Metadata is additional information that is required to make sense of your files – it’s data about data. • This is not a new idea; consider your music or film collection; • At least the title, authors, release date, producers, directors, etc. • Maybe the artwork, the studio, or the format it was released in such as LP (shown left), tape, CD, MD, Video, super 8, DVD, Blu-ray, 3D, etc. • All this information is metadata and allows you to make sense of the data and search the collection for the track that you're looking for. Getting Started with Research Data Management
Research Data ManagementData metadata How will you capture addition information? Music and Video embed a lot of information; File Info displayed using WinAmp Getting Started with Research Data Management
Research Data ManagementData metadata • You need to consider; • What contextual details are needed? • e.g. a description of the capture methods and data analysis. • How will you capture addition information? • e.g. in papers, in a database, in a ‘readme’ text file, in file properties/headers. • Which standards will you use and why? • Data centre recommendations for metadata, controlled vocabularies, and required documentation. • Whether there any encoding guidelines you should follow? Getting Started with Research Data Management
Research Data ManagementData metadata • What contextual details are needed? • Without additional information we do not know • Who is in this picture? • When was it taken? • Where are they? • Who took this photo? • How was this picture taken? • All this information puts this image in context. Without it, it could be photo taken in the 1800s of Mr and Mrs Straus who died on the Titanic, or a Photoshop adjusted image of a young couple dressing up at Brighton pier in 2005. • Without additional information we just don’t know. Getting Started with Research Data Management
Research Data ManagementData metadata • How will you capture addition information? • Many of the analysis and develop details will be in your published work – journal papers, conference proceeding, or articles for example – but if your data is separated from this publication, can others make sense of it? • If you have a results table or database, you should ensure that metadata is provided for each column and/or row • You need to record instructions for use for any software developed • Your images need to have the required properties, which can be automatically attached or can you add more information manually Getting Started with Research Data Management
Research Data ManagementData metadata • Which standards will you use and why? • Many data centres recommend particular metadata for the formats that they support. • This may be controlled vocabularies or required documentation. • Are you require to deposit in a particular data centre? • Are there any encoding guidelines you should follow? • Across the board, the standard set of metadata for data files is generally of the form: • Title, author, file type, size, format, version, date created, date modified, and software. • Datasets also have standard metadata that describes the data collection. Getting Started with Research Data Management
Research Data Management software
Research Data ManagementSoftware • When choosing software; • Is it unique to your equipment? • Stable or under development? • Free to use? • Available on multiple operating systems? • Is it licensed? • Does it produce isolated formats? • Is it backwards compatible? Getting Started with Research Data Management
Research Data ManagementSoftware Obsolescence • Whether planned or not, obsolescence affects software which will affect the longevity of your data if produced or stored in a format specific to the software. • Technical or functional obsolescence • If your equipment that has a limited life expectancy, the software may be short lived. • store your data in the native format AND in a • re-useable, standardised format • use stable, open software for your analysis were • possible Getting Started with Research Data Management
Research Data ManagementSoftware Obsolescence • Whether planned or not, obsolescence affects software which will affect the longevity of your data if produced or stored in a format specific to the software. • Systematic obsolescence • Technology evolves, the demand on software increase, and new editions are release. • previous documents may not be compatible with new editions • save data in an open format • use free, stable softwarefor your analysis Getting Started with Research Data Management
Research Data ManagementSoftware • It may be that your collaborators use different operating systems to you. • Just because it works on Windows, doesn’t mean it works on Linux. • Check if there are suitable software for your colleagues to access your data. • Try and use free, open source options where possible. • Windows Linux Apple Mac. Getting Started with Research Data Management
Research Data Management documentation
Research Data ManagementLab Books • Why keep a Lab Book? • Records are important for development and writing up of your research. You should keep a lab book of your research. • a complete reconstruction of the experiment or measurement can be redone later • the work can be repeated for re-evaluation of the reported results • steps that led to the success or failure of a large project can be extracted • patent lawyers need properly documented evidence of inventions Getting Started with Research Data Management
Research Data ManagementLab Books • Paper lab books are at risk of lossor damage, and cannot be easily searched. • An electronic lab notebook (ELN) is a computer program designed to replace paper lab books; • easier to search upon, • simplify data copying and backups, • and support collaboration Getting Started with Research Data Management
Research Data ManagementLab Books • A good log should include: • Steps and procedures and precautions which are not obvious • References to other people's work, ideas, hints, and inputs • Parameters which might affect the outcome of the experiment • Equipment used, type numbers, serial numbers, any calibration steps taken • Sketches of experimental layout and traces on recorders, oscilloscopes, etc. • The date and time, names of other people observing • Rough error analyses taken during the experiment, repeat observations of doubtful readings, calibration errors allowed for Getting Started with Research Data Management
Research Data ManagementSoftware Documentation • A piece of code without adequate documentation cannot be efficiently or effectively developed, nor can it be understood by users in the future. • Documentation comes in many forms: • Requirements – statements that identify attributes, capabilities, characteristics, or qualities of a system • Architecture – an overview of the software, its purpose and its relations to an environment • Technical – the algorithms, interfaces, and APIs • End User – manual for end users, system administrators, and support staff • Marketing – how to market the product and analysis of the market demand Getting Started with Research Data Management
Research Data ManagementSoftware Documentation • In a research project lifecycle, these documentation forms are appropriate to different stages from the initial development, using the software for analysis, publishing the development and results of your research, and reuse by others later. • Requirements – statements that identify attributes, capabilities, characteristics, or qualities of a system : Using • Architecture – an overview of the software, its purpose and its relations to an environment : Using and Writing Up • Technical – the algorithms, interfaces, and APIs : Writing Up • End User – manual for end users, system administrators, and support staff : Using • Marketing – how to market the product and analysis of the market demand : Reuse Getting Started with Research Data Management
Research Data Management coding
Research Data ManagementCoding When writing software or analytical code it is important that others and your future self can understand what the code is doing. Wilson et al. (2013) published 10 steps that they regard as the “Best Practices for Scientific Computing” and we agree. “As scientists are never taught how to build software many are unaware of tools and practices that would allow them to write more reliable and maintainable code with less effort. We describe a set of best practices for scientific software development that have solid foundations in research and experience, and that improve scientists’ productivity and the reliability of their software.” http://arxiv.org/pdf/1210.0530v3.pdf Getting Started with Research Data Management
Research Data ManagementBest Practice Coding Wilson et al. (2013) • 1. Write programs for people, not computers • A program should not require its readers to hold more than a handful of facts in memory at once. • Names should be consistent, distinctive, and meaningful • Code style and formatting should be consistent • All aspects of software development should be broken down into tasks, roughly an hour long (50-200 lines of code) Getting Started with Research Data Management
Research Data ManagementBest Practice Coding Wilson et al. (2013) • 2. Automate repetitive tasks • Rely on the computer to repeat tasks • Save recent commands in a file for reuse – this could be as simple as using make. • Use a build tool to automate your scientific workflows • 3. Use the computer to record history • Software tools should be used to track computational work automatically • It is already possible to record the: • Unique identifiers and version numbers for raw data records, programs and libraries • Names and version numbers of programs and the values of parameters used to generate any given output Getting Started with Research Data Management
Research Data ManagementBest Practice Coding Wilson et al. (2013) • 4. Make incremental changes • Work in small steps with frequent feedback and course correction • At each stage of this incomplete code, check that it is working correctly • 5. Use version control • Keeping alterations in successive versions means that data can be reverted and it can collaboratively developed. • Use a standard version control system (VCS) • Everything that has been created manually should be put in version control Getting Started with Research Data Management
Research Data ManagementBest Practice Coding Wilson et al. (2013) • 6. Don’t repeat yourself (or others) • Programmers will use the DRY principal to avoid repeating analysing data, and rewriting code; • Every piece of data must have a single authoritative representation in the system • At small scales, code should be modularized rather than copied and pasted • At large scales, re-use code instead of rewriting it Getting Started with Research Data Management
Research Data ManagementBest Practice Coding Wilson et al. (2013) • 7. Plan for mistakes - they’re inevitable • Defensive programming - add assertions to programs to check their operation • They ensure that if something goes wrong, the program halts immediately, which aids debugging and they are also executable documentation i.e. the explain the program as well as checking its behaviour • Automated Testing - check to make sure that a single unit of code is returning correct results, or that the behaviour of a program hasn’t changed • Use an off-the-shelf unit testing library to initialize inputs, run tests, and report their results in a uniform way Getting Started with Research Data Management
Research Data ManagementBest Practice Coding Wilson et al. (2013) • 7. Plan for mistakes (they’re inevitable) • Use a variety of oracles - tells a developer how a program should behave or what its output should be • In research this includes analytical results, experimental results, and previous results from other tried and tested software. • Turn bugs into test cases - write tests that trigger the bug and will prevent that bug from reappearing later • Use a symbolic debugger, which allows you to pause a program, inspect the variable values, and move up and down the code to find the problem Getting Started with Research Data Management
Research Data ManagementBest Practice Coding Wilson et al. (2013) • 8. Optimize software only after it works correctly • In most cases, the most productive way of optimizing code is to get it working correctly, then identify areas that can be sped up. • Use a profiler to identify bottlenecks in your code • Write code in the highest-level language possible – you can always shift to a low-level language (like C or Fortran) if the performance boost is needed • 9. Document design and purpose, not mechanics • refactor code instead of explaining how it works, i.e. rather than write a paragraph to explain a complex piece of code, reorganize it so that its self-explanatory • embed the documentation for a piece of software in that software Getting Started with Research Data Management
Research Data ManagementBest Practice Coding Wilson et al. (2013) • 10. Collaborate • code reviews are the most cost-effective way of finding bugs in code • use pair programming when bringing someone new up to speed and when tackling particularly tricky problems – one developer writes the code which the other provides real-time feedback • In larger teams of developers, use an issue tracking toll to maintain a list of tasks to be performed and bugs to be fixed Getting Started with Research Data Management