220 likes | 404 Views
Towards a common data model and a more efficient ocean data Archive. Steve Rutz NOAA/NESDIS National Oceanographic Data Center NODC Observing Systems Team Leader June 21, 2011. Overview. Background and Terminology Past Data Models New Data Models
E N D
Towards a common data model and a more efficient ocean data Archive Steve Rutz NOAA/NESDIS National Oceanographic Data Center NODC Observing Systems Team Leader June 21, 2011
Overview • Background and Terminology • Past Data Models • New Data Models • Recent activities toward a more efficient archive • Archive automation • Archive access services • NetCDF templates • Outreach
Acquire: receive ocean and coastal data from U.S. and foreign sources Archive: preserve those data assets for the long term Access: provide access to archival data and data products for business, federal, science, and many other customers Add Value: assemble easy-to-use, long term collections for science and applications So, what does NODC do? Scientific Stewardship
The OAIS Reference Model Open Archival Information System (OAIS) • ISO Standard (14721) for Digital Archives • Applies to all organizations that need to preserve digital information for the long-term • Does NOT specify any particular implementation • An organization conforms to the OAIS RM by discharging a minimal set of responsibilities and supporting basic information concepts
The OAIS Environment Producerprovides information to be preserved Management sets overall policy Consumer seeks and acquires preserved information OAIS Archive Consumer Producer Management The OAIS Environment from 30,000 ft
Data Model (NODCentric) An abstract model that documents and organizes environmental data for communication between data producers, the Archive (NODC), and data consumers so that applications can be written to access and store data. Data Producer ↔ NODC ↔ Data Consumer
Past Data Model: MULDARS • MULti-disciplinary Data Archive and Retrieval System (aka, NODC Master Data Files) • NODC developed MULDARS in 1970s • Dozens of File Formats • One for each type of data • ASCII/text, 80 or 120 character records • Example: High-resolution CTD/STD Data (F022) • Example: Benthic Organism Data (F132) • Most data stored/accessed in one or more of these formats • Ended in 1994 – Difficult to maintain • Converting new data • Adding new data types
New Data Models: netCDF • Network Common Data Form (netCDF) • Developed by Unidata program at the University Corporation for Atmospheric Research (UCAR) • In different contexts: data model, file format, or API • NODC started receiving data in netCDF – 1990s • Two Data Models • Classic • Enhanced
Archive Automation • Submission Information Form – a Submission Agreement reached between NODC and the Producer that specifies a data model for the Data Submission Sessions • IDs POCs, environmental data types, file formats, etc. • Submission Information Package – data and metadata files packaged by Producer and acquired by NODC • Archival Information Package • Accession Tracking Database – tracks metadata relevant to data discovery and records version control for long-term stewardship • Producer's SIP – data are preserved as submitted to NODC • Archive's supplementary files – browse graphic of geographic coordinates, FGDC metadata records, and more interoperable file formats of the data for long-term stewardship
Example: NDBC Data • Archival buoy data from NDBC in F291 format (Meteorology, Oceanography, and Wave Spectra Data from Buoys) • NODC “manually” archived since 1970 • Difficult for NDBC to add new data types (last one added in 2005) and maintain code • NODC-NDBC Modernization Project – 1st phase completed 2nd Quarter 2011 • NODC automated acquisition and archiving of data from NDBC’s moored (weather) buoys and Coastal-Marine Automated Network (C-MAN) stations • Data in netCDF-4 – uses Enhanced Data Model • Next phase – Tropical Atmosphere-Ocean (TAO) buoys and CTD casts taken during TAO maintenance cruises
Example: GHRSST Project • NODC serves as GHRSST Long Term Stewardship and Reanalysis Facility • First data stream automatically ingested into NODC Ocean Archive System • Over 1.6 million netCDF files and 33 TB of SST data • Transitioning from netCDF-3 to netCDF-4 Classic by 2013 • http://ghrsst.nodc.noaa.gov
Discovery Discovery Human to Machine Interfaces Machine to Machine Interfaces Discoveryis enabled through numerous interfaces designed for both humans and their machine clients. Human-to-machine interfaces include government-mandated generalized portals like Data.gov and Geospatial One-Stop (GOS) Google Data.gov GOS OAS Geoportal Server Web App CSW Geoportal Server REST API OpenSearch SRU/ISO23950 The NODC Ocean Archive Discovery services are available for ALL of the NODC Archive holdings, but better metadata supports better discovery! AIP AIP AIP AIP AIP
Access and Use Get your DIPs! LAS, GIS, KML Enhanced online access, visualization, and analysis tools: These capabilities require more structured metadata and standardized file formats, so are available to the fewest archive holdings. WCS, WMS, SOS DAP Data Access Protocol (DAP): Requires standard file formats so is available to fewer archive holdings. FTP and HTTP Basic FTP/HTTP access for all Archival Information Packages (AIP) in the NODC Ocean Archive: These distribution methods have no format or metadata requirements so they work for all archive holdings, but they provide only basic download capability. The NODC Ocean Archive AIP AIP AIP AIP AIP
Access Services • Ocean Archive System • http://www.nodc.noaa.gov/Archive/Search • HTTP and FTP • ftp://data.nodc.noaa.gov/ • http://data.nodc.noaa.gov/ • OPeNDAP’s Data Access Protocol via Hyrax • http://data.nodc.noaa.gov/opendap • OGC’s WMS and WCS via THREDDS Data Server • http://data.nodc.noaa.gov/thredds • Web Accessible Folder of metadata harvested by Google, geodata.gov (aka, Geospatial One Stop), and Data.gov • http://data.nodc.noaa.gov/NESDIS_DataCenters/metadata/index.html • Live Access Server • http://data.nodc.noaa.gov/las
Access Services Stay tuned … More to Come! • OGC’s Catalog Service for the Web (CSW) • Search/Retrieval via URL (SRU) • Geoportal Server • ArcGIS Server • and someday Sensor Observation Service (SOS) In other words, this is not your father’s NODC! (or your adviser’s, or the NODC you knew while in grad school…)
NODC’s netCDF templates • Follows standard conventions • NetCDF Climate and Forecast (CF) Metadata • Unidata’s netCDF Attribute Convention for Dataset Discovery (ACDD) • Recommends best-practices for variables and attributes – e.g., uuid, platform, instrument, and expand acronyms
NODC’s netCDF templates • Based on "feature types" by Unidata and CF • trajectory (“done”) • profile (“done”) • grid (“done”) • point (started) • trajectory profile (started) • time series • time series profile • swath
NetCDF and Conventions stack point time series trajectory swath Software Stack GHRSST, WOD, GTSPP, etc standards profile time series profile trajectory profile grid ACDD Extended Community Conventions Custom software NODC Best Practices Community Conventions libcf, nc-ISO, GIS software (e.g. GDAL), LAS CF This is now an OGC standard netCDF- API, OPeNDAP Servers (e.g. THREDDS), Matlab Data Model netCDF-classic netCDF-enhanced Low level netCDF, HDF API netCDF-3 netCDF-4(/HDF-5) File Format CF Feature Types ERDDAP – Environmental Research Division’s Data Access Program netCDF- network Common Data Format HDF – Hierarchical Data Format GIS – Geographic Information Systems DAP – Distributed Access Protocol API – Application Programming Interface NODC – National Oceanographic Data Center CF- Climate Forecast ACDD - Attribute Conventions for Data Discovery GDAL – Geospatial Data Abstraction Library ISO – International Organization for Standardization LAS - Live Access Server
Outreach • NODC Archive Training Session for IOOS Regional Association Data Managers (April 2011) • NOAA EDMC (June 2011) • Earth Science Information Partner (ESIP) Federation meeting (August 2011) • Next steps with netCDF templates • Post to NOAA's Global Earth Observation - Integrated Data Environment (GEO-IDE) for comment • Propose to CF
Information? Collaboration? Send us an e-mail! To: NODC.DataOfficer@noaa.gov From: Stakeholder Subject: NODC’s netCDF templates