360 likes | 382 Views
Data Pipeline & Workflow. Ed Chapman OOI Chief Systems Engineer. Steve Gaul OOI Systems Engineer/Architect. Goal. Address Areas for Recommendations #1 “Data Pipeline and workflow” and #5 “Data Products and Data Product Algorithms” Specific topics:
E N D
Data Pipeline & Workflow Ed Chapman OOI Chief Systems Engineer Steve Gaul OOI Systems Engineer/Architect
Goal Address Areas for Recommendations #1 “Data Pipeline and workflow” and #5 “Data Products and Data Product Algorithms” Specific topics: “Sensing, ingest (all data types – instruments, engineering, tier 1, video, cruise, algorithms, calibration tables, etc), data versioning, data schema/storage, and data product delivery”
By the Numbers • Number of Deployed Platforms= 89*** • Number of Deployed Instruments= 814*** • Number of Instrument types= 47 • Number of Instrument Models= 77 • Number of Uncabled dataset agent drivers= 71 • Number of Cabled instrument agent drivers= 42 • Number of Algorithms= 89 (52 L1 and 37 L2) • Number of Data Product Types= 203 • Number of Unique products= 3928 (L0, L1, L2) • Number of Unique L0 products= 1640 • Number of Unique L1 products= 1533 • Number of Unique L2 products= 755 • ***As of ECR 1300-00419 3
Sense and Ingest Data • Several classes of data • Instrument samples/profiles • Platform engineering • Specialized data streams; video, tier 1 • Physical samples • Logs, photos • Metadata; calibration sheets, as-built lists • Several data acquisition paths • Live streaming data to shore processing (RSN) • Remote automated collection; data telemetered to shore (CG/EA) • Generally sub-sampled or otherwise simplified • Post recovery data collection from recovered platforms • Physical samples processed post cruise/recovery • Manual collection of logs, photos, etc.; associated via metadata
Permanent storage Instrument Driver and Agent Data Product Algorithm Calibration Table Secondary Post-Deployment calibration values POLYVAL Algorithm User Secondary Post-Recovery calibration values POLYVAL Algorithm Interpolation QC algorithms (range, spike, stuck, gradient, trend, combined) Lookup Tables Human in the loop
Permanent storage Driver and Agent Instrument User Ingest for Instruments Including Tier1 and HD video
Permanent storage Driver and Agent Engineering System User Ingest for Engineering data
Ingest for other items • Cruise documents • Algorithms
L0 L0 Instrument Driver and Agent Permanent storage User Uncalibrated Raw Instrument Data
L1a Permanent storage Instrument Driver and Agent Calibration Values User Internally Calibrated Raw Instrument Data
L0 Permanent storage Instrument Driver and Agent L1a Data Product Algorithm Calibration Values User Primary Calibration of Uncalibrated data
L0 Permanent storage Instrument Driver and Agent L1a Data Product Algorithm Calibration Values Secondary Post-Deployment calibration values POLYVAL Algorithm User L1b(Post Deployment) Secondary calibration
L0 Permanent storage Instrument Driver and Agent L1a Data Product Algorithm Calibration Values Secondary Post-Deployment calibration values POLYVAL Algorithm L1b(PD) User Secondary Post-Recovery calibration values POLYVAL Algorithm L1b(PR) L1b(Intrp) Interpolation Secondary calibration
Calibration actions • PS or Marine Operator creates Primary Calibration Values • PS or Marine Operator creates Secondary Post-Deployment Calibration Values • PS or Marine Operator creates Secondary Post-Recovery Calibration Values • Values are uploaded through the UI as csv files Calibration Values are associated with a specific instrument for a specific period of time
Calibration Updates • If new values are uploaded for any of the three, the new values overwrite the prior values. • Assumption is we will only upload new values if there was a mistake with the old ones. We don’t want to allow errors to propagate so we delete the old values
Permanent storage Instrument Driver and Agent Data Product Algorithm Calibration Table Secondary Post-Deployment calibration values POLYVAL Algorithm User Secondary Post-Recovery calibration values POLYVAL Algorithm Interpolation QC algorithms (range, spike, stuck, gradient, trend, combined) Lookup Tables Human in the loop
Storage Intent-- for most instruments all science and engineering data is retained in OOI storage for the life of the program. (external archiving will be covered in a later presentation) Planned on an order of magnitude difference between-- video camera (1) Hydrophones (11), still cameras (10), seismometers (13) Everything else (779)
Data Volume Per Year 138 And Seismometer 60 HD Video Kept for 27
Balancing intent & cost HD Video Camera L2-SR-RQ-3402 – “Buffering for not less than six months of all video imagery shall be provided” NSF approved Data Use Policy (DCN 1102-00010)--
Permanent storage Instrument Driver and Agent Data Product Algorithm Calibration Table Secondary Post-Deployment calibration values POLYVAL Algorithm User Secondary Post-Recovery calibration values POLYVAL Algorithm Interpolation QC algorithms (range, spike, stuck, gradient, trend, combined) Lookup Tables Human in the loop
Database L0 L0 L1 Data Product Algorithm L2 Data Product Algorithm Primary Calibration Function L1a L2b Secondary Calibration Functions L1b QC Algorithms QC Algorithms Human In The Loop Human In The Loop L1a L1b and QC flags L1c L0 L2c QC flags L2b GUI User
Output Data Product Variables • Single L1 data product, with the following variables (i.e., columns in the time series): • <measurement>_L1a (e.g., Conductivity_L1a) • <measurement>_L1b_Post_Deployment_Cal • <measurement>_L1b_Post_Recovery_Cal • <measurement>_L1b_Interpolated • <measurement>_L1c • QC_Flag_GlobalRange • QC_Flag_LocalRange • <additional QC flags> • Single L2 data product, similar to above • Single “Parsed”(Combined) product per instrument, with all variables for applicable L1 and L2 products, additional time stamps, and other variables.
Output Data Product Metadata • In the metadata (i.e., ‘Metadata’ link from ERDDAP page, AND metadata on Data Product facepage on OOINet UI): • Calibration coefficients (as a comma separated list) • QC Look Up Table (as a url, or possibly as values in a TBD format) • Data Product Algorithm (as a url) • DPS for Data Product Algorithm (as a url) • QC Algorithms (as urls) • DPS’s for QC Algorithms (as urls) • POLYVAL Algorithm (as a url)
OOI is about getting data to the users! Must maintain a balance between data quality, data quantity, and budget.
Questions? Specific topics: “Sensing, ingest (all data types – instruments, engineering, tier 1, video, cruise, algorithms, calibration tables, etc), data versioning, data schema/storage, and data product delivery”