650 likes | 660 Views
Learn how to create a well-documented file naming system and directory structure to decrease versioning issues and data duplication. Plan and share your research documentation to assist other researchers in understanding your data.
E N D
Organizing Your Data GEO 802, Data Information Literacy Winter 2019 – Lecture 4 Gary Seitz, MA
Lesson 4 Outline File-naming conventions Data organization Documenting your process Keeping a [lab] notebook Luis Prado from The Noun Project
Plan an organizational structure for your data using a file naming system and directory structure that is well-documented and interoperable with other data sets in order to decrease versioning issues and data duplication. Articulate a plan to collect and share the documentation of your research and methodologies in order to assist other researchers in making sense of your data. Objectives of today’s hands-on-session
Can you find your data?If not, have you considered… • Clear directory structure & file naming conventionshttps://www.jisc.ac.uk/guides/managing-information/good-file-name(File naming conventions for specific disciplines) • File renaming • File version control • For this to be successful a consistent and disciplined approach is required. • Easier to accomplish as and when data files are generated rather than retrospectively attempting to implement. • When organization methods become too time consuming, consider automated methods.
Organising your data • Research data files and folders need to be organised in a systematic way to be: • identifiable and accessible for yourself, • identifiable and accessible for colleagues, and for future users. • Thus it is important to plan the organisation of your data before a research project begins. • Doing so will prevent any confusion while research is underway or when multiple individuals will be editing and / or analysing the data.
The benefits of consistendt data file labelling • Data files are distinguishable from each other within their containing folder • Data file naming prevents confusion when multiple people are working on shared files • Data files are easier to locate and browse • Data files can be retrieved not only by the creator but by other users
The benefits of consistendt data file labelling • Data files can be sorted in logical sequence • Data files are not accidentally overwritten or deleted • Different versions of data files can be identified • If data files are moved to other storage platform their names will retain useful context
File-naming strategies • Be consistent • Have conventions for naming: • (1) Directory structure • (2) Folder names • (3) File names • Naming datasets according to agreed conventions should make file naming easier for colleagues because they will not have to ‘re-think’ the process each time. • Always include the same information • (e.g. date and time) • Retain the order of information • (e.g. YYYYMMDD, not MMDDYYY )
File-naming strategies • Be descriptive • Try to keep file and folder names under 32 characters • Within reason, Include relevant information such as: • Unique identifier (ie. Project Name or Grant # in folder name) • Project or research data name • Conditions (Lab instrument, Solvent, Temperature, etc.) • Run of experiment (sequential) • Date (in file properties too) • When using sequential numbering, make sure to use leading zeros to allow for multi-digit versions. For example, a sequence of 1-10 should be numbered 01-10; a sequence of 1-100 should be numbered 001-010-100. • No special characters: & , * % # ; * ( ) ! @$ ^ ~ ' { } [ ] ? < > - + / • Use only one period and before the file extension • (e.g. name_paper.doc NOT name.paper.doc OR name_paper..doc) • Many files are used independently of their file or directory structure, so provide sufficient description in the file name.
File-naming best practices Best Practices: Be Descriptive: 75092238.txt is not helpful. Instead: 20120814_instrument8_rainyday_raw.txt (up to 255 characters) Don’t rely on nesting in folders: 2012/august/instrument8/day14/raw.txt Use consistent structure that falls into a useful order (for sorting) and decide on shared terminology List versions alphanumerically, eg. v1, v2, v3 rather than last, final, finalfinal, useTHISone Use numerical dates, eg. YYYYMMDD rather than Dec09 Use underscores instead of full-stops or spaces because, like special characters, these are parsed differently on different systems. File names should provide context for the contents of the file, making it distinguishable from files with similar subjects or different versions of the same file.
File naming conventions s/n, variable Retain order Project_instrument_location_YYYYMMDDhhmmss_extra.ext Index/grant conditions Leading zero!
File-naming strategies • Order by date: 19550412_notes_MassObs.docx 19550412_questionnaire_MassObs.pdf 19631215_notes_Gorer.docx 19631215_questionnaire_Gorer.pdf • Order by subject: Gorer_notes_19631215.docx Gorer_questionnaire_19631215.pdf MassObs_notes_19550412.docx MassObs_questionnaire_19550412.pdf • Order by type: Notes_Gorer_19631215.docx Notes_MassObs_19550412.docx Questionnaire_Gorer_19631215.pdf Questionnaire_MassObs_19550412.pdf • Forced order with numbering: 01_MassObs_questionnaire_19550412.pdf 02_MassObs_notes_19550412.docx 03_Gorer_questionnaire_19631215.pdf 04_Gorer_notes_19631215.docx
On using number orderin file names… Dates listed in order of collection
On using number orderin file names… If we sort by MM/DD/YY, dates are out of order.
On using number orderin file names… If we sort by DD/MM/YY, dates are out of order.
On using number orderin file names… If we sort by YY/MM/DD, dates are in order.
Version control It is important to consistently identify and distinguish versions of data files. This ensures that a clear audit trail exists for tracking the development of a data file and identifying earlier versions especially if data is frequently updated by multiple users. Suggested strategies: • Use a sequential numbered system: v1, v2, v3, etc. • Don't use confusing labels: revision, final, final2, etc. • Record all changes -- no matter how small • Discard obsolete versions (but never the raw copy) • Use auto-backup instead of self-archiving, if possible University of Leichester: Good Practice Document Version Control
Example United States Census Bureau 2010 Census Reference Maps File NamingConventions https://www.census.gov/programs-surveys/geography/technical-documentation/naming-convention/reference-maps.html Have a look at these File Naming Conventions Make a list of what you would take from this for your thesis
Exercise 1 2
Exercise 1 2
File organization AGU presentation Class presentation OS presentation Presentations Ocean Sciences AGU Class
File structure Where to put stuff so you won’t lose it • Logical to you – and easily understandable to others • Ease of sharing / exchange of data • Once you develop a naming scheme for your folders, stick to it • Defining the ‘end product’ of a project helps maintain file structure
Good directory structure should be predictable and help you easily identify which folders hold which information. • Ways to organize your files: • Data type (text, images, models, etc.) • Time (year, month, session, etc.) • Project title • Experimental run • Subject under investigation • Step in the research process • ...
Which primary data defindes your research? Material Type e.g. Pottery Geographical Location Site A Material A Site B Material B Site C Material C Archaeological material or Location (site based) • Distinguish between projects. • Distinguish between sub-folders. • Define ‘end-product’ of research – and keep clean of temporary folder and files. • Research designs change and so must file structure. • Avoid overuse of folders – easier said than done though.
File structure http://www.wur.nl/en/Expertise-Services/Data-Management-Support-Hub/Browse-by-Subject/Organising-files-and-folders.htm
- Consistency inside and outside of project ?- Problems of multiple computers. “Dummy Project”File structure and naming What are these files doing here?
Whennaming& organizing your files and folders… be thoughtful be consistent document your approach
Exercise 1 2
Exercise 1 2
Exercise 3
Exercise 3
Further Reading Frazer, M. (14 January 2013). An Elevator Pitch for File Naming Conventions. ACRL TechConnect blog. Retrieved from http://acrl.ala.org/techconnect/?p=2607 MIT Libraries. Data management and publishing: Organize your files. Massachusetts Institute of Technology. Retrieved 8 August 2014 from http://libraries.mit.edu/data-management/store/organize
Batch (or bulk) renaming • Software tools exist that can organise data files and folders in a consistent and automated way through batch renaming. • There are many situations where batch renaming may be useful, such as: • where images from digital cameras are automatically assigned filenames consisting of sequential numbers • where proprietary software or instrumentation generate crude, default or multiple filenames • where files are transferred from a system that supports spaces and/or non-English characters in filenames to one that doesn't (or vice versa). Batch renaming software can be used to substitute such characters with acceptable ones.
Tool Box: Batch renaming Use free bulk renaming tools: • Windows: • AntRenamer (www.antp.be/software/renamer) • RenameIT(sourceforge.net/prpjects/renameit) • BulkRename Utility (www.bulkrenameutility.co.uk/) • Mac: • Renamer 6 (for Mac) (renamer.com/) • Name Changer (mrrsoftware.com/namechanger/) • Linux: • GNOME Commander (www.nongnu.org/gcmd/) • GPRename(http://gprename.sourceforge.net/)
Benefits of consistent data file labelling are • Data files are not accidentally overwritten or deleted • Data files are distinguishable from each other within their containing folder • Data file naming prevents confusion when multiple people are working on shared files • Data files are easier to locate and browse • Data files can be retrieved both by creator and by other users • Data files can be sorted in logical sequence • Different versions of data files can be identified • If data files are moved to other storage platform their names will retain useful context