280 likes | 440 Views
Developing a National Plan for Glider Operations: Data Management. IOOS Glider Workshop August 2012 Jim Potemra, UH. Goal. Develop a national plan for glider operations. Part of this plan should address the management of glider data.
E N D
Developing a National Plan for Glider Operations:Data Management IOOS Glider Workshop August 2012 Jim Potemra, UH
Goal Develop a national plan for glider operations. Part of this plan should address the management of glider data. • Start discussion on developing a common plan for glider-to-archive data streams • Motivation is to promote data interoperability as well as making data easier to access and use
National Plans Wave Plan: “…all data will flow through IOOS DAC operated by NDBC and CDIP using IOOS-DIF standards and metadata…” HFR Plan: “…data management principles…have been coordinated with IOOS DIF…”
Components of DMP • Data formats and standards • Include vocabulary, conventions, metadata • Data services • How to get data to users • Implementation • How will this get done and by whom • Issues include incorporation of QA/QC
1. Data formats and standards • Assumption 1: Essentially three types of gliders, roughly measuring similar variables • Assumption 2: Raw formats may differ, but getting to an initial (near-real-time) netCDF file would be possible • Proposal: netCDF data model with CF conventions (standardized vocabulary and units)
1. Data formats and standards (cont’d) • Advantages: • Easy to implement (hopefully) • In-line with Argo, OceanSITES, IOOS • Large user community and tool set • Disadvantages: • Even within netCDF there are permutations • New featureType attribute might be slow to catch on
2. Data services • Assumption 1: there will not be a single solution • Assumption 2: the user community is well known • Assumption 3: interoperability and open access are goals • Proposal: distribute data through four main mechanisms: • Direct access: pilots and science PI’s will likely access data via ssh to server machine and/or NFS mounted disk • GTS: data to GTS either directly off modem or via DAC/WMO center • ftp: local (and maybe remote) research use may want data transfer • OPeNDAP: remote users, automatic harvesting done via OPeNDAP
2. Data services (cont’d) • Advantages: • Easy to implement (hopefully) • In-line with IOOS and Argo • Will cover almost all possible user requests • Disadvantages: • Versioning will be difficult • Several access logs • Maintenance of different servers
3. Implementation • Data archives • Distributed centers (data assembly centers) that provide equivalent services for seamless integration • Central assembly center where all data providers submit data (GDAC) NODC/NDBC/NCDC? • Data files (QC issue…) • Single data stream with flags marking raw, real-time QC, delayed-mode data (e.g., Argo) • Two data streams (e.g., tide gauge)
QA/QC considerations • Different layers to this: • Different data streams or over-write • Done by provider, aggregator or separate team • Either way, a documented plan would be helpful • QA/QC extends to data and dimensions (e.g., how accurate are time/location; is this important?) • Impacts data file w.r.t. vocabulary and flags (so not just an issue of what tests are run)
3. Implementation (cont’d) • Data governance • Data management team (real-time operators, delayed-mode QC) • Ad hoc, standards-based (articulate best practices and leave to providers) • Misc • Provide service to users? • Others?
Suggested Approach User-driven: Issue one is to identify users • Scientific PI • Researcher (non-PI) • Operational modeling centers • Re-analysis modeling • Pilots • General (non-scientific) users
Complete Picture Iridium modem GTS Operational Center ssh Iridium modem Shore Station GLIDER Pilot console ftp/cp Science PI ftp/ssh Data processing; conversion to netCDF; QA/QC applied ftp/cp Reanalysis modeling http ftp/cp Data Service (OPeNDAP) Researcher ? http Non-science user Archive Center
Suggested Approach (cont’d) Data providers at other end: Issue one is to document existing practices • Inventory of gliders? • Three main types; data formats for these? • Role of manufacturer?
Areas to consider: • Staring point is glider • What variables and formats need to be addressed? • All transmitting via Iridium? • Multiple ending points • Pilots, scientists (PI’s), scientists (research), modeling centers (real-time), model reanalysis studies (historical), other users (?) • Added aspect of regional and national viewers and/or aggregation centers • Implementation • Federated (all groups carry on), or centralize (e.g., Argo) with “data assembly centers” (DACs) • Maintenance of two data streams • Sea level (real-time and delayed mode) • Argo (combination)
Based on this goal, discuss development of a standard format(s) and possible standard transport(s) mechanism • Depending on time and interest, discussion on data format could extend to terminology • Based on this goal, discuss an implementation plan • How to execute data plan, e.g., distributed system, federated system, DAC’s, maintenance of real-time and delayed mode, etc.
Data Management Issues • If goal is discovery • Need a central catalog (service) • If goal is availability • Need to provide a service (ftp, OPeNDAP, etc.) • If goal is interoperability • Need to settle on common data model and/or service (netCDF with ftp) • All sorts of other stuff • Central vs. distributed archive
IOOS model thus far • All data available asap • All data available via “standard” service • OPeNDAP/THREDDS • SOS, ftp • ERDDAP/vis tools • Data service more or less dictates format/model: • netCDF
Data availability: IOOS RA IOOS has 11 Regional Associations. The availability of glider data via these RA’s are in three broad categories: • No obvious link to glider data or plots • AOOS (Alaska) • CaRA (Caribbean) • GLOS (Great Lakes) • GCOOS (Gulf Coast) • NERACOOS (Northeast Atlantic) • SECOORA (Southeast Atlantic) • Some data available via OPeNDAP, limited plots/maps • CenCOOS (Central California) single mission • PacIOOS • Data, maps and viewer • MARACOOS (Mid-Atlantic) Rutgers • NANOOS (Pacific Northwest) APL/UW • SCCOOS (Southern California) Scripps
Data availability: NOAA/NODC • GTSPP: • http://www.nodc.noaa.gov/GTSPP/index.html • Data by name (e.g., pacific/2012/06): gtspp_14239088_te_111.ncgtspp_14239470_te_111.ncgtspp_14239585_te_111.ncgtspp_14239643_te_111.nc • Files have featureType: profile • Deep Water Horizon • http://www.nodc.noaa.gov/General/DeepwaterHorizon/glider_float.html#glider • Single lat/lon/time per profile: • Temp(time,depth,lat,lon) for a single time,lat,lon
Data availability: NOAA/NDBC • Data list and pre-made profile plots
Data availability: other • UW/APL • Scripps • Rutgers • C-MORE/HOT