120 likes | 131 Views
Learn fundamental data management practices for improved usability, efficiency, and understandability of your research data. Workshop sponsored by ORNL Distributed Active Archive Center and CC&E Joint Science Workshop.
E N D
Please take the Workshop Survey https://www.surveymonkey.com/s/update
Data Management Practices for Early Career Scientists:Closing Robert Cook ORNL Distributed Active Archive Center Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN cookrb@ornl.gov CC&E Joint Science Workshop College Park, MD April 19, 2015
Plan for archiving data “Begin with the end in mind” • Identified the Data Center • Collaborated with data center during project • Communicated: • Volume and Number of Files • Special needs • Delivery dates 3
Followed Fundamental Data Practices • Define the contents of your data files • Define the variables • Use consistent data organization • Use stable file formats • Assign descriptive file names • Preserve processing information • Perform basic quality assurance • Provide documentation • Protect your data • Preserve your data
What to submit to the archive? • Well-structured data files, with variables, units, and fill values well-defined • Document that describes the data set • Additional information • Article written with the data set • Files that describe project, protocols, or field sites (photographs) • Material from Project Web site or Wiki • Basic description of the data (15 questions) • http://daac.ornl.gov/PI/questions.shtml
Issues with data sets received • Descriptive information about data files and content is incomplete • Data description and collection method • Field sites • Quality / uncertainty of data • Inconsistencies with publication • Files uploaded are not identified / described • Variable names are not defined or vague • “Height” unclear, change to “canopy_height” • Perhaps append the method/sensor for added clarity
Information about Data (15 questions) Information About Your Data Set • Have you looked at our Best Data Management Practices • Who produced this data set? • What agency and program funded the project? What awards funded this project? (comma separate multiple awards) Data Set Description • Provide a title for your data set. (maximum 84 characters) What type of data does your data set contain? What does the data set describe? (2-3 sentences) • What parameters did you measure, derive, or generate? (comma separated, limit to ten) • Have you analyzed the uncertainty in your data? Briefly describe your uncertainty analysis. (2-3 sentences) Will the uncertainty estimates be included with your data set?
Information about Data (cont) Temporal and Spatial Characteristics • What date range does the data cover? (YYYY-MM-DD) What is a representative sampling frequency or temporal resolution for your data? • Where were the data collected/generated? • Which of the following best describes the spatial nature of your data? (single point, multiple points, transect, grid, polygon, n/a) • What is a representative spatial resolution for these data? • Provide a bounding box around your data. Data Preparation and Delivery • What are the formats of your data files? How many data files does your product contain? What is the total disk volume of your data set? (MB) • Is this data set final, unrestricted, and available for release? What are the reasons to restrict access to the data set? • Has this data set been described and used in a published paper? If so, provide a DOI or upload a digital copy of the manuscript with the data set. • Are the data and documentation posted on a public server? If so, provide the URL.
Data Center: Stewardship and Archive Functions • Ingest • perform QA checks • compile project-provided metadata • generate additional metadata • convert to archival file formats • Metadata / Documentation • prepare final metadata record and documentation • Archive / Release • generate citation and DOI (digital object identifier) • Exploration and Distribution • provide tools to explore, access, and extract data • Post-Project Data Support • provide long-term secure archiving • serve as a buffer between end users and PIs • provide usage statistics • Stewardship • security, disaster recovery • migration to new computer systems
Workshop Goal Provide fundamental data management practices that investigators should perform during the course of data collection. • To improve the usability of data sets for: • You • Collaborators • People outside your project • By following the practices taught in this workshop, your data will be • less prone to error, • more efficiently structured for analysis, and • more readily understandable for any future research.
Please take the Workshop Survey • https://www.surveymonkey.com/r/72MJWGF
Thank you ! Workshop Sponsors