1 / 7

Data Organization

Data Organization. Quality Assurance and Transformations. Data Validation. Hook, et al. 2010. Best Practices for Preparing Environmental Data Sets to Share and Archive . Available online: http://daac.ornl.gov/PI/BestPractices-2010.pdf. Check for missing, impossible, anomalous values

cannie
Download Presentation

Data Organization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Organization Quality Assurance and Transformations

  2. Data Validation Hook, et al. 2010. Best Practices for Preparing Environmental Data Sets to Share and Archive. Available online: http://daac.ornl.gov/PI/BestPractices-2010.pdf. • Check for missing, impossible, anomalous values • Plotting • Mapping • Examine summary statistics • Verify data transfers from notebooks to digital files • Verify data conversion from one file format to another

  3. Preserve & Record Information Processing Script (R) Keep Original (Raw) File • Do not include transformations, interpolations, etc. • Make the raw data “read-only” Save as a new file

  4. Data Manipulation • You will need to repeat reduction and analysis procedures many times • You need to have a workflow that recognizes this • Scripted languages can help capture the workflow • You could just document all steps by hand • After the 20th iteration through your data set; however, you may feel more fondly towards scripted languages • Learn the analytical tools of your field • Talk to colleagues, etc. and choose at least one tool to master

  5. Preserve Processing Information Temperature data (T) Data import into R Data in R format Salinity data (S) Quality control & data cleaning “Clean” T & S data Analysis Summary statistics Graph Production • Scripts used in file cleaning • Programs / algorithms • Document workflows or data file transformations

  6. Preserving: Scripted Notes • Use a scripted language to process data • R Statistical package (free, powerful) • SAS • MATLAB • Processing scripts records processing • Steps are recorded in textual format • Can be easily revised and re-executed • Easy to document • GUI-based analysis may be easier, but harder to reproduce

  7. Reproducibility Methods Do use version control Do document software environment Only save what cannot be reconstructed from original data + code

More Related