10 likes | 171 Views
The Data Lifecycle Flow: For Me, This Time Erica Johns, Bob Dattore , & Sam Levis. Future Lifecycle Components (figure 2) . Steps for Curating This Data (figure 2). 12. PRESERVATION Format Conversion Historical data: still in usable format, but a less updated version
E N D
The Data Lifecycle Flow: For Me, This Time Erica Johns, Bob Dattore, & Sam Levis Future Lifecycle Components (figure 2) Steps for Curating This Data (figure 2) • 12. PRESERVATION • Format Conversion • Historical data: still in usable format, but a less updated version • Active Migration System: every 2-3 years to keep data usable • Often conversion or translation of content to keep file readable • 13. ACCESS, USE, AND REUSE • Free Access through RDA • Must Register: acts as a form of tracking data use • RESEARCH • Find Free Level 4 Ameriflux Data + Variables Scientist Wants • http://public.ornl.gov/ameriflux/index.html • DATA APPRAISAL • Pervasive throughout Lifecycle • Are the contents the scientists’ asked within the data? • 3. DATA ACQUISITION • Sign User Agreement with ORNL • Download CSV files for Ameriflux Sites • Should have asked ORNL Permission for Ingestion here • 4. DATA ACCUMULATION • Compile CSV files into multi-year lists of Hourly, Daily, Weekly, Monthly • 5. DATA APPRAISAL • Decide against Weekly data- week not defined as 7 days • Decide to gap fill missing years with -9999 • Decide on ascending year order • 6. DATA REFORMAT • CSV to Excel • Fill in missing data with -9999 • Excel to CSV • CSV to NetCDF • Computer Program written to convert in C++ • Standardized CF conventions in production of NetCDF added to increase usability • 7. DATA APPRAISAL • Does the NetCDF work with our scientist’s script? • This is when I asked Permission for Ingestion from ORNL • Review for Metadata Creation • 8. METADATA CREATION • First step towards Data Ingestion into Archive • Helps User find Data • Some documentation visible with dataset, some used only for faceted browsing in RDA • 9. DATA INGESTION • Through dsarch program data files are ingested into archive and information about the files are recorded into a database called RDA DB, within RDAMS • Requires scripting within C-shell language to ingest • 10. ARCHIVE • RDA = CISL Research Data Archive • http://rda.ucar.edu/datasets/ds387.0/ • 11. DATA APPRAISAL User Feedback • Affects EVERY stage of Lifecycle • Data Scientists depend on User Feedback to understand Data Integrity • Users • Find Mistakes • Report Errors in Data • Effects • Data manager checks error : with original or ingested? • Data Reappraisal & Reacquisition • Update Data Accumulated, Reformat, Amend Metadata • Reingest & Rearchive Figure 1: DataONE Idealized Data Lifecycle Considerations within Data Lifecycle • DATA INTEGRITY • From a curators perspective, based largely on user feedback • However, being familiar with the type of data and variables • can allow the curator to notice if values are off • Depend on the reliability of the data provider • ANCILLARY DATA • Know audience but accessible for interdisciplinary science • Unless data from NCAR, links provided to ancillary data • Located within Documentation tab if ReadMe file provided • Problems with dataset reported in Documentation section • Goal: anticipate user needs, not lead users with metadata • REPRODUCIBILITY • Must be able to track original data requests • Easiest to reproduce if derived product is computer based • Disaster program ensures data exists in 2 geographic locations Figure 2: Actual Lifecycle for Level 4 AmerifluxDataset