480 likes | 701 Views
THREDDS Data Server Unidata’s Common Data Model Background / Summary. John Caron Unidata/UCAR Mar 2007. THREDDS Data Server. HTTP Tomcat Server. catalog.xml. Application. THREDDS Server. WCS. OPeNDAP. HTTPServer. NetcdfSubset. NetCDF-Java library. motherlode.ucar.edu.
E N D
THREDDS Data ServerUnidata’s Common Data ModelBackground / Summary John Caron Unidata/UCAR Mar 2007
THREDDS Data Server HTTP Tomcat Server catalog.xml Application THREDDS Server • WCS • OPeNDAP • HTTPServer • NetcdfSubset NetCDF-Java library motherlode.ucar.edu Datasets IDD Data
THREDDS Catalogs • XML over HTTP • Hierarchical listing of online resources (datasets) • Container for arbitrary search metadata • Standard set maps to DC, GCMD, ADN • Unidata/CDP • Metadata can be inherited • Design goal: Make it easy for data providers • TDS uses for configuration • Client view vs. server view • Data Access URLS • “Crossing the protocol boundary”
THREDDS WCS 1.0 Server • Each (gridded) Dataset is WCS • Each Grid is a Coverage • Return formats • GeoTIFF: floating point, greyscale • NetCDF / CF-1.0 (same as NetcdfSubset Service) • No reprojections, resampling • GALEON 2 • upgrade to WCS 1.1 • Try returning point datasets
THREDDS OPeNDAP Server • Current version 2.0; NASA ESE standard • Working on new 4.0 protocol spec • Based on Java-OPeNDAP library • shared development by Unidata/opendap.org • Any CDM dataset can be served • Server4 (Hyrax): • latest version of opendap.org C++ library • uses THREDDS catalog generation code • THREDDS Catalogs replace dods_dir
Common Data Model HTTP Tomcat Server catalog.xml Application THREDDS Server • WCS • OPeNDAP Then a miracle happens • HTTPServer • NetcdfSubset NetCDF-Java library hostname.edu Datasets IDD Data
THREDDS Catalog.xml Application Scientific Datatypes Datatype Adapter NetCDF-Java version 2.2 architecture NetcdfDataset CoordSystem Builder ADDE NetcdfFile I/O service provider OPeNDAP NetCDF-3 NIDS NcML NetCDF-4 GRIB HDF5 GINI Nexrad DMSP …
I/O Service Provider Implementations • General: NetCDF, HDF5, OPeNDAP • Gridded: GRIB-1, GRIB-2 • Radar: NEXRAD level 2 and 3, DORADE, Chinese NEXRAD • Point: BUFR, ASCII • Satellite: DMSP, GINI, McIDAS AREA • In development / tentative • NOAA CLASS legacy files • Barrowdale DataBlade
Scientific Datatypes Point Trajectory Station Profile Radial Grid Swath Common Data Model Layers Coordinate Systems Data Access
NetCDF-4 and Common Data Model (Data Access Layer)
NetCDF-4 C library • 4.0 Beta implements CDM access layer • complete, but waiting for HDF5 release 1.8 to finalize file format (Maybe this month, 1.5 years late!) • Persistence format for complete CDM • 4.1: adding Coordinate Systems • Optional layer, focus on CF-1 (libcf) • 4.?: merge OPeNDAP access (pending funding)
NcML: NetCDF Markup Language XML representation of netCDF metadata • Core: netCDF data access model • Coordinate System: general and georeferencing coordinate system • Dataset: redefine, aggregate, subset Luca Cinquini (NCAR/SCD/ESG), John Caron, Ethan Davis, Bob Drach (LLNL), Stefano Nativi (Florence), Russ Rew
NcML • NcML Coordinate Systems further developed into NcML-G by Stefano et al. • NcML Core and Dataset combined into single schema to allow dataset modification • Aggregation: • Union • Syntactic join on (existing or new) outer dimension • Semantic aggregation of (runtime, forecast time) = Forecast Model Run Collection
NcML example <?xml version="1.0" encoding="UTF-8"?> <netcdf xmlns="http://www.unidata.ucar.edu/schemas/netcdf/ncml-2.2" location=“/data/nids/N0R_20041119_2147"> <attribute name=“cdm_datatype" value=“Radial" /> <remove type=“attribute” name=“password" /> <variable name="Reflectivity" orgName=“R34768”> <attribute name="units" value=“dBZ" /> </variable> </netcdf>
TDS / NcML example <datasetScan name="Ocean Satellite Data" path="ocean/sat" dirLocation="R:/tds/netcdf/"> <netcdf> <attribute name="Conventions" value="CF-1.0"/> </netcdf> </datasetScan>
TDS / NcML aggregation <dataset name="WEST-CONUS_4km Aggregation" urlPath="satellite/3.9/WEST-CONUS_4km"> <netcdf > <aggregation dimName="time" type="joinNew"> <scan location="/data/ldm/pub/satellite/3.9/WEST-CONUS_4km/" suffix=".gini" /> </aggregation> </netcdf> </dataset>
Datasets vs. Files • Must hide actual location of data files on your server • Would like to hide actual file format • Must encapsulate collections of files into logical datasets • Homogenous metadata • Hide arbitrary storage decisions • Minimize number of datasets
Data Model: Sampled Functions Ourphenomena are continuous functions: F: Domain → Range where Domain = subset of space-time (3 spatial, time) (Ε4) Range = Rn (product set of real numbers) Our measurements are sampled functions Domain is a point subset = {p, p єΕ4} M: E4 → Rn
Variables Variable is a container for an Array of values dimensions lat = 64; lon = 128; variables: float temperature( lat, lon); Domain is a set of points in Index space: Temperature : {[0..63] x [0..127]} → R Temperature : I2→ R Variable : Im→ Rn
Coordinate Systems Coordinate Axis : Im→ R {Axis} = Coordinate System : Im→ E4 V: Im→ Rn CS: Im→ E4 V ° CS-1 : E4 → Rn
Scientific Data Types • Trying to go beyond index-space subsetting • Trying to satisfy V ° CS-1 : E4 → Rn • I.e. support subsetting using Space, Time “queries” • Based on datasets Unidata is familiar with • APIs are evolving • Intended to scale to large, multifile collections • Corresponding “standard” NetCDF file format conventions
Datatype Grid PointObs RadialSweep Swath Dataset GridDataset FMRCDataset CollectionOfPointObs StationCollectionOfPointObs StationCollectionOfRadialSweep Implementations
Conclusions • CDM is our implementation data model • Map to data access models such as OGC • Current work is to serve collections instead of individual files. • Dataset is desired level of granularity • Scientific data types are implementations with specialized access
Datatype Collection • GridDataset collection of GridDatatype
THREDDS Catalog.xml Application Scientific Datatypes Datatype Adapter NetCDF-Java version 2.2 architecture NetcdfDataset CoordSystem Builder ADDE NetcdfFile I/O service provider OPeNDAP NetCDF-3 NIDS NcML NetCDF-4 GRIB HDF5 GINI Nexrad DMSP …
Gridded Datatype • Cartesian coordinates • All dimensions are connected • horizontal: lat,lon or projection x,y • time(time) orthogonal 1D • seperable: (x, y) X time X z float gridData(t,z,y,x); float time(t); float y(y); float x(x); float lat(y,x); float lon(y,x); float z(z); float height(t,z,y,x);
GridDatatype methods CoordinateAxis getTaxis(); CoordinateAxis getXaxis(); CoordinateAxis getYaxis(); CoordinateAxis getZaxis(); Projection getProjection(); int[] findXYindexFromCoord( double x_coord, double y_coord); LatLonRect getLatLonBoundingBox(); Array getDataSlice (Range[] …) GridDatatype makeSubset (Range[] …)
Radial Data • Polar coordinates • All dimensionsare connected • Not separate time dimension radialData(radial, gate) : distance(gate) azimuth(radial) elevation(radial) time(radial)
Swath • lat/lon coordinates • not separate time dimension • all dimensionsare connected swathData(line,cell) lat(line,cell) lon(line,cell) time(line) z(line,cell) ??
Unstructured Grid • Pt dimension not connected • Looks the same as point data • Need to specify the connectivity explicitly float unstructGrid(t,z,pt); float lat(pt); float lon(pt); float time(t); float height(z);
Point Observation Data • Set of measurements at the same point in space and time • Point dimension not connected float obs1(pt); float obs2(pt); float lat(pt); float lon(pt); float z(pt); float time(pt); Structure { lat, lon, z, time; v1, v2, ... } obs( pt);
PointObsDataset Methods // Iterator<StructureData> Iterator getData( LatLonRect boundingBox, Date start, Date end);
Time series Station Data Structure { name; lat, lon, z; Structure{ time; v1, v2, ... } obs(*); // connected } stn(stn); // not connected
StationObs Methods // List<Station> List getStations( LatLonRectboundingBox); // Iterator<StructureData> Iterator getData( Station s, Date start, Date end);
Trajectory Data • pt dimension is connected • Collection dimension not connected Structure { lat, lon, z, time; v1, v2, ... } obs(pt); // connected Structure { name; Structure { lat, lon, z, time; v1, v2, ... } obs(*); // connected } traj(traj) // not connected
Profiler/Sounding Station Data Structure { name; lat, lon, time; Structure { z; v1, v2, ... } obs(*); // connected } loc(nloc); // not connected Structure { name; lat, lon; Structure { time, Structure { z; v1, v2, ... } obs(*); // connected } time(*); // connected } stn(stn); // not connected
Data Types Summary • Data access through a standard API • Convenient georeferencing • Specialized subsetting methods • Efficiency for large datasets
CDM Payoff N + M instead of N * M things on your TODO List! File Format #1 Visualization &Analysis NetCDF file File Format #2 OpenDAP Server File Format #N WCS Service Web Service
Next: DataType Aggregation • Work at the CDM DataType level, know (some) data semantics • Forecast Model Collection • Combine multiple model forecasts into single dataset with two time dimensions • With NOAA/IOOS (Steve Hankin) • Point/Station/Trajectory/Profile Data • Allow space/time queries, return nested sequences • Start from / standardize “Dapper conventions”
Forecast Model Collections
Coordinate Systems: implicit/explicit • NetCDF, OPeNDAP, HDF data models do not have explicit coordinate systems • so georeferencing not part of API • Need conventions to specify (eg CF-1, COARDS, etc) • GRIB, HDF-EOS (eg) are explicit • But no uniform API
netCDF-3 Interface netCDF-4 Library HDF5 Library NetCDF-4 C Library NetCDF-4 C Library
Conclusion • Standardized Data Access in good shape • HDF5, NetCDF, OPeNDAP • Write an IOSP for proprietary formats (Java) • But that’s not good enough! • To do: • Standard representations of coordinate systems • Classifications of data types, standard services for them