240 likes | 386 Views
GEO4012. Data Storage and Research Methodology. Some aspects on. USE AND STORAGE OF DATA AT THE DEPARTMENT. Data storage during Master thesis project. Every Master-student at the institute will obtain a user-specific directory to store temporary data, results, documents etc.
E N D
GEO4012 Data Storage and Research Methodology
Some aspects on USE AND STORAGE OFDATA AT THE DEPARTMENT
Data storage during Master thesis project • Every Master-student at the institute will obtain a user-specific directory to store temporary data, results, documents etc. • Your home-directory (M:\ on windows or ~<username> on Linux) should not be used to store Master project related data. • Reason: • More disk space • Easier to share data with your supervisor and/or other students working in the same project • Jurisdiction
Data storage during Master thesis project • Master-project directory • K:\section- or project-disk\username (Windows) • /felles/section- or project-disk/username (Linux) • Linux example: • Windows example: You will receive an e-mail when the directory has been created.
Data storage at IG@UIO • Data and results should be stored in open formats (e.g. ASCII) or Global Standards (e.g. SEGY, bit maps) • Proprietary formats (e.g. Excel) should not be used to store final results • Raw data, if not ‘confidential’, should preferably be stored at K:\data\<data-type>. The IT-group can help to copy data to that site. • IT-related questions should always be directed to drift@geo.uio.no with a cc to your supervisor. It is recommended to talk with your supervisor first.
Snapshot and backup • Backupofyour files usually runs nightly, for some data areas weekly. • Snapshot ofyourhome area is a constant service. See https://www.uio.no/tjenester/it/maskin/filer/backup/hjelp/snapshot.html • There is backupofyourhome area in addition to the snapshot mechanism. This is especiallyuseful for older data.
System overview vann DATA HANDLING jern ice ekman abel rossby sverdrup kant HOME DIRECTORIES
Some aspects of RESEARCH METHODOLOGY
Research Documentation • Why document your research? • To allow other researchers to understand the methods you used; • To be able to replicate your results; • To determine if your findings are reliable; • To make it easier for those who come after you; • To avoid suspicion of fraud or plagiarism; • To receive credit for the research you’ve done on a project and eventually write scientific papers;
Research Documentation • How to document your research? • Keep track of all the methods / models used to conduct your research • Keep information which describes all aspects of your data (Meta-data) • Keep a list of all the scientific papers you read/consult • Draft your research report (don’t wait for the very end)
Models/Methods • Simulation is an important tool in engineering and research. • But be careful with its use: • How well does the simulation model reflect the reality? • You might be inferring conclusions based on “artificial worlds” ... • So: • Always keep track of the model version you used and all the changes you may have done
Meta-data • Metadata (metacontent) are defined as the data providing information about one or more aspects of the data, such as: • Means of creation of the data • Purpose of the data • Time and date of creation • Creator or author of the data • Location on a computer network where the data were created • Standards used
Metadata - Examples Bad documentation Good documentation #!/bin/bash # ### Script for matlab code CryoGrid2 on Abel ### on 8 tasks # ### Mandatory parameters to run a job via SLURM #SBATCH --job-name=PFNNorway #SBATCH --account=geofag #SBATCH --time=10:10:00 #SBATCH --mem-per-cpu=2000M #SBATCH --nodes=1 --ntasks-per-node=8 ## Set up job environment on abel source /cluster/bin/jobsetup module load matlab ## Start matlab matlab -nodisplay -nodesktop -nosplash < cryo.m #!/bin/bash #SBATCH --job-name=test #SBATCH --account=geofag #SBATCH --time=10:10:00 #SBATCH --mem-per-cpu=2000M #SBATCH --nodes=1 --ntasks-per-node=8 source /cluster/bin/jobsetup module load matlab matlab -nodisplay -nodesktop -nosplash < cryo.m
Metadata- Examples Petrel
Smart data storage by humans Input data (TB) Article/Master Program/code Program/code Program/code Program/code source Libraries Compilers Hw/time Script(s) Changes? Machine Meta-data Internet visualization data Output data (TB) To Save
Version control • Why do we need version control? • What are the basic operations for version control? • Example with SVN
Why do we need version control? • A version control system keeps track of all work and all changes in a set of files, and allows several developers (potentially widely separated in space and time) to collaborate. • To keep track of a larger programming or text project including file locking/version control and conflicts.
Other tools for managing projects • rcs - UNIX command: rcs creates new RCS files or changes attributes of existing ones. An RCS file contains multiple revisions of text, an access list, a change log, descriptive text, and some control attributes. • CVS - Concurrent Version Control, http://en.wikipedia.org/wiki/CVS_(software) • GIThub/"GIT" - GitHub offers both paid plans for private repositories, and free accounts for open source projects. http://en.wikipedia.org/wiki/GitHub
Basic operations for version control • Checkout • Update • Commit • Tag • Branch • Merge http://en.wikipedia.org/wiki/Revision_control
SVN - Initial copy of the repository • Finding the repository • Ask a team member where to find it or check the local repository! • $ ssh svn.uio.no • $ cd /svnroot • /usit/vcs-uio/svnroot • $ ls • osloctm3 • ... • Getting the repository (At your master project directory): • $ mkdirsvn • $ cd svn • $ mkdir osloctm3 • $ svn checkout svn+ssh://svn.uio.no/svnroot/osloctm3
SVN - Normal usage of existing repo • Going there • $ cd osloctm3 • $ svn update [FILE] • U fc/fc-switches.html .. • Editing the file(s) • $ emacs –nw fc-switches.html • Checking the updates (optional) • $ svn diff fc-switches.html • Sending the change upstream • $ svn commit fc-switches.html
Note: Why or why not sending the changes upstream? • Yes: you did something for the project • Found an error/bug • Put new functionality • No: //Think!// • Changes are for your interest only • It may break the idea of the project • NB! *When* you find your changes missing in the original it is way to late and you must • drop it. The cost on either side may be quite big • choose to make a branch/new repo (who will help you?)
References • USIT (Norwegian): http://www.uio.no/tjenester/it/maskin/filer/versjonskontroll/svn.html • Internet, ie. http://www.abbeyworkshop.com/howto/misc/svn01/ • Subversion own project web http://subversion.tigris.org/ • Wikipedia - http://en.wikipedia.org/wiki/Subversion_%28software%29