THREDDS Status

THREDDS Status John Caron Unidata 5/7/2013

Outline • Release schedule • Aggregations -> featureCollections / NCSS • GRIB refactor • Discrete Sampling Geometry (point data) • ncstream/cdmRemote/cdmrFeature

THREDDS/CDM personnel • John Caron (1.0) • head cook and bottle washer • Ethan Davis (.25) • Architecture, standards, catalogs • Marcos Hermida(1.0) • NCSS, WMS, maven, spring, javascript • Lansing Madry(1.0) • support, testing, domain expertise • Sean Arms (.25) • IDV/CDM interface, NCEP models, rosetta, python, domain expertise • Dennis Heimbigner(0.5) • OpenDAP, HTTPClient • Julius Chastang(0.0): • IDV, python interface • Yuan Ho (0.0) • IDV, radar IOSPs

KillCat Release 4.3.17 (1 June 2013) • Last 4.3 Feature Release • motherlode.ucar.edu -> thredds.ucar.edu • All Unidata servers upgrade to 4.3 • 4.3 Major Features • GRIB complete rewrite, FeatureCollection scaling • netCDF-4 writing with netCDF C library (JNA) • CF 1.6 Discrete Sampling Convention • NetCDF Subset Service and WMS improvements • Software Engineering: GitHub, Maven

BlackCat Release 4.4 (31 Oct 2013) • Improvements for ESG – millions of catalogs • Refactor/harmonize NCSS, CdmRemote, and RadarServer APIs • Extend NCSS for use in WRF initialization • Release THREDDS Data Manager (TDM) for outside use • Migrate HttpClient from 3.x (EOL) to 4.x • Possible • OpenID authentication • WaterML from NCSS Grid as Point (where does extra metadata come from?) • Experiments with Async

SantaCat Release 4.5 (25 Dec 2013) • DAP4 server and client • Grid Feature Collection, replace FMRC • GRIB FeatureCollection: Constant Forecast Offset/Hour options • CdmRemoteFeature service implemented for all CF-1.6 DSG feature types

SchrödingersCat Release 5.x • Require Java 7 (nio2) and Tomcat 7 • Java 6 reached end of life Feb 2013 • API changes allowed • TDS configuration refactor • RefactorGridDatatype to Coverage • Swath/Image • Cross-seam lat/lon data requests • Unstructured Grid? • Time-dependent coordinate system? • Better dataset classification • Refactor Catalog reading/writing package • Improved metadata harvesting support

SchrödingersCat Release 5.x • Search/discovery service ? • Asyncronous requests – client and server ? • TDS-lite ? • on demand trusted local server • Access from C, python

Aggregation -> FeatureCollections • Aggregation is associated with virtual datasets defined with NcML • NcML is seriously overloaded with semantics • Originally a client-side configuration, done on-the-fly • Adapted to play seamlessly with TDS configuration catalogs • Server side aggregations more complicated • Hard to do all 3 at once: Very large, updating, performance • <featureCollection> has new set of configuration elements that make it easier for both users and implementors • Performance comes from storing the result dataset info (“.ncx” files) • Aggregation is being phased out in favor of “feature collections” • GRIB, POINT collections are the first real implementation • GRID, FMRC will be refactored in 4.5 • NcML can still be used to modify the datasets • http://www.unidata.ucar.edu/projects/THREDDS/tech/tds4.3/reference/collections/FeatureCollections.html

NetCDF Subset Service • REST web service for coordinate based subsetting on GRID datasets • On our thredds.ucar.edu server, datasets are NCEP model runs as GRIB collections • TDS 4.3: Much improved interface and reliability – needs more performance • Can return netCDF-3 or netCDF-4 files • Net effect is a subset / transformation service

GRIB – CDM/TDS 4.3 • Complete rewrite of GRIB1, GRIB2 IOSPs • Table Handling • Multifilecollections – eliminate user configuration • Automatically figure out coord systems • User Groups for multiple horizontal domains • Indexing (.gbx9) and cache metadata (.ncx) • User configuration passed into the IOSP • TDS featureCollection=GRIB • Time Partitions (performance) • User Configuration for changing datasets • Motivated by NCDC/NOMADS issues and $

netCDF storage

GRIB storage

GRIB Rectilyzationologicment • Turn unordered collection of 2D slices into 3-6D multidimensional array • Each GRIB record (2D slice) is independent • There is no overall schema to describe what its supposed to be • there is, but not able to be encoded in GRIB

GRIB collection indexing 1000x smaller GRIB file GRIB file GRIB file TDS Index file name.gbx9 Index file name.gbx9 Index file name.gbx9 Create Collection Index collectionName.ncx 1000x smaller CDM metadata …

GRIB time partitioning TDS GRIB file GRIB file GRIB file GRIB file GRIB file GRIB file gbx9 gbx9 gbx9 gbx9 gbx9 gbx9 1983 Partition index Collection.ncx ncx ncx … … 1984 … 1985

NCEP GFS half degree • All data for one run in one file • 3.65 Gbytes/run, 4 runs/day, 22 days • Total 321 Gbytes, 88 files • Partition by day (mostly for testing) • Index files • Gbx9: 2.67 Mbytes each • Ncx: 240 Kbytes each • Daily partition indexes : 260K each • Overall index is about 50K (CDM metadata) • Index overhead = grib file sizes / 1000

CFSR timeseries data at NCDC • Climate Forecast Series Reanalysis • 1979 - 2009 (31 years, 372 months) • Total 5.6 Tbytes, 56K files • analyze one month (198909) • 151 files, approx 15Gb. 15Mb gbx9 indexes. • 101 variables, 721 - 840 time steps • records 144600 - duplicates 21493 (15%) • 1.1M collection index, 60K needs to be read by TDS when opening.

Big Data • cfsr-hpr-ts9 • 9 month (275~ day run)4x / day at every 5 day intervals. • run since 1982 to present! • ~22 million files

GRIB - summary • Fast indexing allows you to find the subsets that you want in under a second • Time partitioning should scale up as long as your data is time partitioned • No pixie dust: still have to read the data! • GRIB2 stores compressed horizontal slices, must decompress entire slice to get one value • Experimenting with storing in netcdf-4 • Chunk to get timeseries data at a single point • Still getting the bugs out on changing/updating datasets (4.3.17) • featureCollections will (eventually) solve many of the problems of Aggregations

Discrete Sampling Geometries(aka Point Data) • Conventions added to CF 1.6 • CDM 4.3 has complete implementation • ucar.nc2.ft.point package • TDS 4.3 featureCollection • type = POINT, STATION • Creates a cdmrFeature web service

Discrete Sample Feature Types • point: a collection of data points with no connection in time and space • timeSeries: a series of data points at the same location, with varying time • trajectory: a series of data points along a curve in time and space • profile: a set of data points along a vertical line • timeSeriesProfile: a series of profiles at the same location, with varying time • trajectoryProfile: a set of profiles which originate from points along a trajectory

ucar.nc2.ft.point • Subset by lat/lon box, time range • Iterate over rows (result set) • Not arrays (netCDF classic data model) • Scales to large collections • Allows streaming • Similar to/compatible with RDBMS • Nested tables – hierarchical data model • TODO: arbitrary predicates (filter)

ncstream • NetCDF files (almost always) have to be written, then copied to network • Assumes random access, not stream • “read optimized” : data layout is known • ncstream explores what “streaming netcdf” might look like • “write-optimized”: append only • Efficient conversion to netCDF files on the client • Ncstreamdata model == CDM data model • Binary encoding using Google'sProtobuf • Binary object serialization, cross language, transport nuetral, extensible • Very fast: some tests show >10x OPeNDAP • Have experimental versions in CDM and TDS since 4.1

cdmRemote • Replacement for OPeNDAP 2.0 that can handle the full CDM data model • In 4.3, CDM/TDS uses cdmRemotein preference to OPeNDAP, for remote access to CDM datasets • User can configure this • Index based – just like netCDF • Not currently promoting outside of CDM/TDS stack

cdmrFeature • TDS 4.3 web service for coordinate based subsetting • REST API • Harmonize / merge with NCSS (4.4 Oct 2013) • Extend to all CF-DSG feature types (4.5 Dec 2013) • Intended to be used on collections of DSG (point) data • Needs time partition date in the filename • Output • netCDF-3/CF • XML, CSV • ncstream fro CDM clients • Clients • HTML form, like NCSS • CDM / ToolsUI • Python scripts

featureCollection configuration <featureCollection name="Metar Station Data" featureType="Station" path="urlpath/station/data"> <collection spec="Q:/cdmUnitTest/ft/station/metar/.*nc$" dateFormatMark="#Surface_METAR_#yyyyMMdd_HHmm"/> <update startup="true" rescan="0 5 3 * * ? *" trigger="allow"/> <pointConfigdatasetTypes="cdmrFeature Files"/> </featureCollection>

Why not OPeNDAP? • Reasonable data model using sequences • Incomplete coordinate system data model • cant make requests in lat/lon or time in a standard way • Server side processing not standardized • Client cant discover whats possible • Semantics non standardized • Waiting for DAP4 • See what we have when that’s ready

THREDDS Status

THREDDS Status

Presentation Transcript

VrRBO with THREDDS data store

Integrating a Web Map Service into the THREDDS Data Server

Distributed data access: THREDDS, OAI, CDP

OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data

Federation of Coastal Storm Surge Forecasts using THREDDS, OPeNDAP

GI-cat / THREDDS notes

THREDDS Data Server

THREDDS Data Server Unidata’s Common Data Model Background / Summary

THREDDS Data Server, OGC WCS, CRS, and CF

THREDDS Data Server (TDS) and Data Discovery

THREDDS Data Server

THREDDS and Digital Library Searching

RAMADDA and THREDDS

Unidata’s Common Data Model and the THREDDS Data Server

IOOS Data Services with the THREDDS Data Server

THREDDS Catalogs

THREDDS development Dynamic Catalogs: DQC, Resolvers IDD Data Server ADDE Cataloger

Experiences with Metadata Federation with OAI and THREDDS

DAP4, SOAP and THREDDS

Unidata TDS Workshop THREDDS Data Server Overview

THREDDS, CDM, OPeNDAP, netCDF and Related Conventions

PHENIX STATUS PHENIX STATUS