1 / 35

Data Pipeline & Workflow

Data Pipeline & Workflow. Ed Chapman OOI Chief Systems Engineer. Steve Gaul OOI Systems Engineer/Architect. Goal. Address Areas for Recommendations #1 “Data Pipeline and workflow” and #5 “Data Products and Data Product Algorithms” Specific topics:

candicec
Download Presentation

Data Pipeline & Workflow

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Pipeline & Workflow Ed Chapman OOI Chief Systems Engineer Steve Gaul OOI Systems Engineer/Architect

  2. Goal Address Areas for Recommendations #1 “Data Pipeline and workflow” and #5 “Data Products and Data Product Algorithms” Specific topics: “Sensing, ingest (all data types – instruments, engineering, tier 1, video, cruise, algorithms, calibration tables, etc), data versioning, data schema/storage, and data product delivery”

  3. By the Numbers • Number of Deployed Platforms= 89*** • Number of Deployed Instruments= 814*** • Number of Instrument types= 47 • Number of Instrument Models= 77 • Number of Uncabled dataset agent drivers= 71 • Number of Cabled instrument agent drivers= 42 • Number of Algorithms= 89 (52 L1 and 37 L2) • Number of Data Product Types= 203 • Number of Unique products= 3928 (L0, L1, L2) • Number of Unique L0 products= 1640 • Number of Unique L1 products= 1533 • Number of Unique L2 products= 755 • ***As of ECR 1300-00419 3

  4. Sense and Ingest Data • Several classes of data • Instrument samples/profiles • Platform engineering • Specialized data streams; video, tier 1 • Physical samples • Logs, photos • Metadata; calibration sheets, as-built lists • Several data acquisition paths • Live streaming data to shore processing (RSN) • Remote automated collection; data telemetered to shore (CG/EA) • Generally sub-sampled or otherwise simplified • Post recovery data collection from recovered platforms • Physical samples processed post cruise/recovery • Manual collection of logs, photos, etc.; associated via metadata

  5. Physical FlowSensing and Ingest

  6. System Hardware Summary

  7. Generic Data Flows

  8. Functional FlowSensing and Ingest

  9. Permanent storage Instrument Driver and Agent Data Product Algorithm Calibration Table Secondary Post-Deployment calibration values POLYVAL Algorithm User Secondary Post-Recovery calibration values POLYVAL Algorithm Interpolation QC algorithms (range, spike, stuck, gradient, trend, combined) Lookup Tables Human in the loop

  10. Permanent storage Driver and Agent Instrument User Ingest for Instruments Including Tier1 and HD video

  11. Permanent storage Driver and Agent Engineering System User Ingest for Engineering data

  12. Ingest for other items • Cruise documents • Algorithms

  13. Calibration

  14. L0 L0 Instrument Driver and Agent Permanent storage User Uncalibrated Raw Instrument Data

  15. L1a Permanent storage Instrument Driver and Agent Calibration Values User Internally Calibrated Raw Instrument Data

  16. L0 Permanent storage Instrument Driver and Agent L1a Data Product Algorithm Calibration Values User Primary Calibration of Uncalibrated data

  17. L0 Permanent storage Instrument Driver and Agent L1a Data Product Algorithm Calibration Values Secondary Post-Deployment calibration values POLYVAL Algorithm User L1b(Post Deployment) Secondary calibration

  18. L0 Permanent storage Instrument Driver and Agent L1a Data Product Algorithm Calibration Values Secondary Post-Deployment calibration values POLYVAL Algorithm L1b(PD) User Secondary Post-Recovery calibration values POLYVAL Algorithm L1b(PR) L1b(Intrp) Interpolation Secondary calibration

  19. Calibration actions • PS or Marine Operator creates Primary Calibration Values • PS or Marine Operator creates Secondary Post-Deployment Calibration Values • PS or Marine Operator creates Secondary Post-Recovery Calibration Values • Values are uploaded through the UI as csv files Calibration Values are associated with a specific instrument for a specific period of time

  20. Calibration Updates • If new values are uploaded for any of the three, the new values overwrite the prior values. • Assumption is we will only upload new values if there was a mistake with the old ones. We don’t want to allow errors to propagate so we delete the old values

  21. Ingest for “etc”Is there anything you want to know about?

  22. Versioning

  23. Permanent storage Instrument Driver and Agent Data Product Algorithm Calibration Table Secondary Post-Deployment calibration values POLYVAL Algorithm User Secondary Post-Recovery calibration values POLYVAL Algorithm Interpolation QC algorithms (range, spike, stuck, gradient, trend, combined) Lookup Tables Human in the loop

  24. Storage

  25. Storage Intent-- for most instruments all science and engineering data is retained in OOI storage for the life of the program. (external archiving will be covered in a later presentation) Planned on an order of magnitude difference between-- video camera (1) Hydrophones (11), still cameras (10), seismometers (13) Everything else (779)

  26. Data Volume Per Year 138 And Seismometer 60 HD Video Kept for 27

  27. Balancing intent & cost HD Video Camera L2-SR-RQ-3402 – “Buffering for not less than six months of all video imagery shall be provided” NSF approved Data Use Policy (DCN 1102-00010)--

  28. Data Product Delivery

  29. Permanent storage Instrument Driver and Agent Data Product Algorithm Calibration Table Secondary Post-Deployment calibration values POLYVAL Algorithm User Secondary Post-Recovery calibration values POLYVAL Algorithm Interpolation QC algorithms (range, spike, stuck, gradient, trend, combined) Lookup Tables Human in the loop

  30. Database L0 L0 L1 Data Product Algorithm L2 Data Product Algorithm Primary Calibration Function L1a L2b Secondary Calibration Functions L1b QC Algorithms QC Algorithms Human In The Loop Human In The Loop L1a L1b and QC flags L1c L0 L2c QC flags L2b GUI User

  31. Output Data Product Variables • Single L1 data product, with the following variables (i.e., columns in the time series): • <measurement>_L1a (e.g., Conductivity_L1a) • <measurement>_L1b_Post_Deployment_Cal • <measurement>_L1b_Post_Recovery_Cal • <measurement>_L1b_Interpolated • <measurement>_L1c • QC_Flag_GlobalRange • QC_Flag_LocalRange • <additional QC flags> • Single L2 data product, similar to above • Single “Parsed”(Combined) product per instrument, with all variables for applicable L1 and L2 products, additional time stamps, and other variables.

  32. Output Data Product Metadata • In the metadata (i.e., ‘Metadata’ link from ERDDAP page, AND metadata on Data Product facepage on OOINet UI): • Calibration coefficients (as a comma separated list) • QC Look Up Table (as a url, or possibly as values in a TBD format) • Data Product Algorithm (as a url) • DPS for Data Product Algorithm (as a url) • QC Algorithms (as urls) • DPS’s for QC Algorithms (as urls) • POLYVAL Algorithm (as a url)

  33. OOI is about getting data to the users! Must maintain a balance between data quality, data quantity, and budget.

  34. Questions? Specific topics: “Sensing, ingest (all data types – instruments, engineering, tier 1, video, cruise, algorithms, calibration tables, etc), data versioning, data schema/storage, and data product delivery”

More Related