150 likes | 161 Views
This article discusses the efficiency and benefits of hard-wiring oceanography observing systems, the need for a common language as the number of assets and derived products increase, and the critical role of hard-wiring in quality control and data discovery.
E N D
Systems Oceanography: Observing System Design
Why not hard-wire the system? • Efficiency of interface management • Hard-wire when component number small, connections well defined & static (connections could go as N!) • Common ‘language’ necessary as number of assets and derived products increases. • Stable foundation for derived data processes • Allows wider participation for folks working on software elements – e.g. control, decision aids, QC, derived products, etc. • Ability to work across data sets • Critical for QC • Search functions enabled • Enables discovery
System Research Issues Model 1 Model 2 Model 3 Observation Element 1 Observation Element 2 Observation Element 3 Platforms/sensor development Opportunity cost for coms QC an expert-only task, automated at multiple levels. Performance metrics for observation systems Assimilation tools for all observation. Methods to mix and match. Understanding of consequence. Relation of performance to component systems? Need to build up this area… Where the rubber meets the road – lead to domain specific performance requirements. How do we compare sampling strategies? Quality control (level 1) Observation Skill Assessment Observation Product QC Assimilation Assimilation Assimilation Archive Skill Assessment Skill Assessment Skill Assessment Ensemble Analysis Observation Sensitivity Analysis Nowcast/Forcast Products Skill Assessment QC Deployment strategy
Good News: • Observations were generally assimilated into real-time model forecasts within 24 hours of appearance on data server (after the first few days). • Periodic polling of other servers by the MBARI server was very effective at getting data. • Graphical data products were released on web sites in real-time during experiment • Connectivity issues: • MBARI's had a slow connection to the Internet as of summer 2003 • FTP connections given the lowest priority bandwidth allocation. • Problems in keeping the IP-based firewall up to date • Users without fixed IP addresses had tough time getting though the firewall • A major virus attack during the experiment (Welchia Worm) • The east coast blackout
No clear plan for how and when data would be quality controlled, so data users often had to simultaneously apply their own quality checks to the data. • Researchers often needed prodding to get them to upload their data to the centralized MBARI server. • In some cases, PIs overwrote their data with revised numbers, which lead to everyone needing to refresh their entire copy of the data. • The data that was stored on the server had inadequate descriptive metadata. • Only a few researchers generated of COARDS-compliant NetCDF files, and none used the specified format for variable names and units. • Modelers were not initially required to provide their data to the central data, and made attempts at providing their own access to their model data. However, that access was limited to graphical outputs. • When model data was provided to the central server, decision-makers were not prepared to use it. • No public access to data was possible, other than pre-defined graphical outputs.
Fixes implemented thus far: • MBARI internet connection upgraded to a higher bandwidth (> x10?) • FTP bandwidth allocations have higher priority. • Retrieval of data from remote servers more strongly emphasized, rather than waiting for uploads (pull vs push) • Data management policy established: • Data centralized • Classes of accessibility established • Citing & collaboration rules specified • Missing data, including model outputs, added to the central data server. • Data on the central data server was converted into a common format (retaining data in old formats), with consistent descriptive metadata. • Publicly accessible data access sever and visualization tool online: • Provide public read-only access to graphics and the converted data • Researchers' wishes for data access embargoes and usage requirements incorporated directly.
Data Flow: Assumptions • Multiple data originators • Data originators must provide data descriptions, including usage guidelines • Data is quality controlled at multiple levels: • At instrument level (pre deployment, post recovery) • At instrument class level • Across observation elements • Both original and quality controlled data must be archived • All (raw and derived) data preserved. • Data archived on a community data server • Central archive allows querying across data sets
Observation Campaign Data Flow: AOSNII • Data (raw and/or quality controlled) is transferred by researchers into a central repository • Archive maintainers responsible for converting data into a common format and adding descriptive information to data • Archive interface allows for querying against latitude, longitude, depth and time within one data set at a time.
Observation Campaign Data Flow: The Future • Data (raw and quality controlled) is transferred by researchers into a central repository, in a defined format, along with descriptive data • Archive interface allows for querying across data sets, where users can modify “canned” queries, or build their own original queries. • Based on query history, archive maintainers continually enhance data indices to improve cross-dataset queries
Observation Campaign Data Flow: Getting from AOSNII to the Future • Data originators need incentives to supply their data to a central repository • Need to anticipate some of the kinds of cross-dataset queries that users will make, and design system to facilitate those queries • Need to understand how best to store four- to five-dimensional, multi-terabyte model outputs, to facilitate querying • Can test future systems with existing data
Survey Design Observation performance Prediction performance
Down-Sample and Interpolate Model Field at t = 0to Simulate Assimilation of AUV Samples