1 / 44

Better Data, Better Science! [ Better Science through Better Data Management ]

Better Data, Better Science! [ Better Science through Better Data Management ]. Todd D. O’Brien NOAA – NMFS - COPEPOD. “BETTER DATA” is …. Easily Accessible Well Documented Integrated / Interlinked The Best Quality possible. Oops! ( When Data Management Fails ). “BETTER DATA” is ….

Download Presentation

Better Data, Better Science! [ Better Science through Better Data Management ]

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Better Data, Better Science![ Better Science through Better Data Management ] Todd D. O’Brien NOAA – NMFS - COPEPOD

  2. “BETTER DATA” is … • Easily Accessible • Well Documented • Integrated / Interlinked • The Best Quality possible

  3. Oops! (When Data Management Fails)

  4. “BETTER DATA” is … • Easily Accessible • Well Documented • Integrated / Interlinked • The Best Quality possible

  5. “BETTER DATA” is … • Easily Accessible • Well Documented • Integrated / Interlinked • The Best Quality possible

  6. WHY QC? • To find errors in the data …

  7. WHY QC? • To find errors in the data … • To detect instrument failure or sampling problems

  8. WHY QC? • To find errors in the data … • To detect instrument failure or sampling problems • To detect phenomena of scientific interest • Natural physical or biological events • Something “new”

  9. WHY QC? • To find errors in the data … that were not present in the original data ?!

  10. WHY QC? • To find errors in the data … that were not present in the original data ?! • Data Pathway errors • human error • computer error

  11. WHAT TO QC? • Individual values (the measurements)? • Profile of multiple values? • Cruise of multiple profiles? • Project of multiple cruises? • Region or Ocean of multiple Projects? • Entire World of multiple Regions?

  12. What software, tools, and skills are available?

  13. What software, tools, and skills are available?

  14. What software, tools, and skills are available?

  15. What software, tools, and skills are available?

  16. What software, tools, and skills are available?

  17. What software, tools, and skills are available?

  18. What software, tools, and skills are available?

  19. Let’s get started …

  20. QC OF THE “WHAT & HOW”

  21. QC OF THE “WHAT & HOW” • Need to first understand the methods, variables, and units of the data before trying to QC the data

  22. QC OF THE “WHAT & HOW” • Need to first understand the methods, variables, and units of the data before trying to QC the data • Are all labels clear and unambiguous • Are methods provided (or a reference) • What are the value units

  23. QC OF THE “WHEN & WHERE”

  24. QC OF THE “WHEN & WHERE” • Primary Data: • First, check the master ship record • Then check PI files

  25. QC OF THE “WHEN & WHERE” • Primary Data: • First, check the master ship record • Then check PI files • Simple Range Checks • Time (0-23? 1-24?) • What is the time zone? • Lat +/- 90 Lon +/- 180 • Are hemisphere signs present (E/W) or described

  26. QC OF THE “WHEN & WHERE” • Map the Cruise Track • sorted by station sequence • sorted by sampling time

  27. QC OF THE “WHEN & WHERE” • Calculate ship speed (distance/time) between stations

  28. QC OF THE “HOW MUCH”

  29. QC OF THE “HOW MUCH” • First, look at the background environment • Check for depth inversions • Check for density inversions • Look at T vs. S plot

  30. QC OF THE “HOW MUCH” • Look at the variable vs. depth

  31. QC OF THE “HOW MUCH” • Check against basic value ranges

  32. QC OF THE “HOW MUCH” • Check against basic value ranges • Check for excessive gradients (spikes) between values at adjacent depths

  33. QC OF THE “HOW MUCH”

  34. Expert / Specialist Data Centers

  35. Expert / Specialist Data Centers • Can provide guidance on • Metadata (standards, minimum requirements) • Data Formats (format suggestions / review) • Tools and Methods

  36. Expert / Specialist Data Centers • Can provide guidance on • Metadata (standards, minimum requirements) • Data Formats (format suggestions / review) • Tools and Methods • May have advanced visualization or QC methods available for your data.

  37. Empirical Comparisons with Historical Observations (ECHO)

  38. Expert / Specialist Data Centers(just a few examples) • CCHDO- CLIVAR Carbon & Hydrographic Data Office • BCO-DMO- Biological and Chemical Oceanography Data Management Office • BODC- British Oceanographic Data Centre • COPEPOD- Coastal & Oceanic Plankton Ecology, Production & Observation Database

  39. The Conclusions

  40. Some Conclusions • Each additional layer of QC and examination may highlight issues that were previously undetected.

  41. Some Conclusions • Each additional layer of QC and examination may highlight issues that were previously undetected. • Each instance of transfer or reformatting the data has a chance of introducing new errors (or data loss).

  42. Some Conclusions • Each additional layer of QC and examination may highlight issues that were previously undetected. • Each instance of transfer or reformatting the data has a chance of introducing new errors (or data loss). • The comprehensiveness of the co-stored metadata will determine the extent to which the data are still usable/understandable 10+ years after the project.

  43. “BETTER DATA” is … • Easily Accessible • Well Documented • Integrated / Interlinked • The Best Quality possible

More Related