440 likes | 643 Views
Better Data, Better Science! [ Better Science through Better Data Management ]. Todd D. O’Brien NOAA – NMFS - COPEPOD. “BETTER DATA” is …. Easily Accessible Well Documented Integrated / Interlinked The Best Quality possible. Oops! ( When Data Management Fails ). “BETTER DATA” is ….
E N D
Better Data, Better Science![ Better Science through Better Data Management ] Todd D. O’Brien NOAA – NMFS - COPEPOD
“BETTER DATA” is … • Easily Accessible • Well Documented • Integrated / Interlinked • The Best Quality possible
“BETTER DATA” is … • Easily Accessible • Well Documented • Integrated / Interlinked • The Best Quality possible
“BETTER DATA” is … • Easily Accessible • Well Documented • Integrated / Interlinked • The Best Quality possible
WHY QC? • To find errors in the data …
WHY QC? • To find errors in the data … • To detect instrument failure or sampling problems
WHY QC? • To find errors in the data … • To detect instrument failure or sampling problems • To detect phenomena of scientific interest • Natural physical or biological events • Something “new”
WHY QC? • To find errors in the data … that were not present in the original data ?!
WHY QC? • To find errors in the data … that were not present in the original data ?! • Data Pathway errors • human error • computer error
WHAT TO QC? • Individual values (the measurements)? • Profile of multiple values? • Cruise of multiple profiles? • Project of multiple cruises? • Region or Ocean of multiple Projects? • Entire World of multiple Regions?
QC OF THE “WHAT & HOW” • Need to first understand the methods, variables, and units of the data before trying to QC the data
QC OF THE “WHAT & HOW” • Need to first understand the methods, variables, and units of the data before trying to QC the data • Are all labels clear and unambiguous • Are methods provided (or a reference) • What are the value units
QC OF THE “WHEN & WHERE” • Primary Data: • First, check the master ship record • Then check PI files
QC OF THE “WHEN & WHERE” • Primary Data: • First, check the master ship record • Then check PI files • Simple Range Checks • Time (0-23? 1-24?) • What is the time zone? • Lat +/- 90 Lon +/- 180 • Are hemisphere signs present (E/W) or described
QC OF THE “WHEN & WHERE” • Map the Cruise Track • sorted by station sequence • sorted by sampling time
QC OF THE “WHEN & WHERE” • Calculate ship speed (distance/time) between stations
QC OF THE “HOW MUCH” • First, look at the background environment • Check for depth inversions • Check for density inversions • Look at T vs. S plot
QC OF THE “HOW MUCH” • Look at the variable vs. depth
QC OF THE “HOW MUCH” • Check against basic value ranges
QC OF THE “HOW MUCH” • Check against basic value ranges • Check for excessive gradients (spikes) between values at adjacent depths
Expert / Specialist Data Centers • Can provide guidance on • Metadata (standards, minimum requirements) • Data Formats (format suggestions / review) • Tools and Methods
Expert / Specialist Data Centers • Can provide guidance on • Metadata (standards, minimum requirements) • Data Formats (format suggestions / review) • Tools and Methods • May have advanced visualization or QC methods available for your data.
Expert / Specialist Data Centers(just a few examples) • CCHDO- CLIVAR Carbon & Hydrographic Data Office • BCO-DMO- Biological and Chemical Oceanography Data Management Office • BODC- British Oceanographic Data Centre • COPEPOD- Coastal & Oceanic Plankton Ecology, Production & Observation Database
Some Conclusions • Each additional layer of QC and examination may highlight issues that were previously undetected.
Some Conclusions • Each additional layer of QC and examination may highlight issues that were previously undetected. • Each instance of transfer or reformatting the data has a chance of introducing new errors (or data loss).
Some Conclusions • Each additional layer of QC and examination may highlight issues that were previously undetected. • Each instance of transfer or reformatting the data has a chance of introducing new errors (or data loss). • The comprehensiveness of the co-stored metadata will determine the extent to which the data are still usable/understandable 10+ years after the project.
“BETTER DATA” is … • Easily Accessible • Well Documented • Integrated / Interlinked • The Best Quality possible