70 likes | 91 Views
Data Organization. Quality Assurance and Transformations. Data Validation. Hook, et al. 2010. Best Practices for Preparing Environmental Data Sets to Share and Archive . Available online: http://daac.ornl.gov/PI/BestPractices-2010.pdf. Check for missing, impossible, anomalous values
E N D
Data Organization Quality Assurance and Transformations
Data Validation Hook, et al. 2010. Best Practices for Preparing Environmental Data Sets to Share and Archive. Available online: http://daac.ornl.gov/PI/BestPractices-2010.pdf. • Check for missing, impossible, anomalous values • Plotting • Mapping • Examine summary statistics • Verify data transfers from notebooks to digital files • Verify data conversion from one file format to another
Preserve & Record Information Processing Script (R) Keep Original (Raw) File • Do not include transformations, interpolations, etc. • Make the raw data “read-only” Save as a new file
Data Manipulation • You will need to repeat reduction and analysis procedures many times • You need to have a workflow that recognizes this • Scripted languages can help capture the workflow • You could just document all steps by hand • After the 20th iteration through your data set; however, you may feel more fondly towards scripted languages • Learn the analytical tools of your field • Talk to colleagues, etc. and choose at least one tool to master
Preserve Processing Information Temperature data (T) Data import into R Data in R format Salinity data (S) Quality control & data cleaning “Clean” T & S data Analysis Summary statistics Graph Production • Scripts used in file cleaning • Programs / algorithms • Document workflows or data file transformations
Preserving: Scripted Notes • Use a scripted language to process data • R Statistical package (free, powerful) • SAS • MATLAB • Processing scripts records processing • Steps are recorded in textual format • Can be easily revised and re-executed • Easy to document • GUI-based analysis may be easier, but harder to reproduce
Reproducibility Methods Do use version control Do document software environment Only save what cannot be reconstructed from original data + code