540 likes | 661 Views
Unidata’s Common Data Model and the THREDDS Data Server. John Caron Unidata/UCAR, Boulder CO Jan 6, 2006 ESIP Winter 2006. Outline. Definitions Creating a Common Data (Access) Model from NetCDF, HDF5, OPeNDAP CDM Coordinate Systems, Data Types CDM implementation
E N D
Unidata’sCommon Data Modeland theTHREDDS Data Server John Caron Unidata/UCAR, Boulder CO Jan 6, 2006 ESIP Winter 2006
Outline • Definitions • Creating a Common Data (Access) Model from NetCDF, HDF5, OPeNDAP • CDM Coordinate Systems, Data Types • CDM implementation • NetCDF Markup Language (NcML) • The THREDDS Data Server
NetCDF-3 • Machine and OS independent file format for “self-describing” scientific data • C library (Fortran, C++, Perl, IDL, MatLab, Python, Ruby), Java library • Efficient subsetting of multidimensional arrays. • > 20,000 downloads last year
HDF5 • Machine and OS independent file format for “self-describing” scientific data • C library (Fortran, Java, PyTables) • Evolution from HDF4, but different. • HDF-EOS, HDF5-EOS, standard formats for EOSDIS, ASCI, NPOESS • Parallel-IO, chunked storage, compression filters, many data types. • Developed at NCSA, now independent
NetCDF-4 • Project funded by NASA to create new version of netCDF using the HDF5 file format. • “Extend and merge” netCDF and HDF5 • Widespread use and simplicity of netCDF • Generality and performance of HDF5
NetCDF-Java 2.2 (nj22) • 100% Java library • Prototype implementation of CDM • File formats: • General: NetCDF, HDF5, OPeNDAP • Grids: GRIB1, GRIB2 • Radar: NEXRAD, NIDS, DORADE • Satellite: DMSP, GINI • Access to THREDDS catalogs
OPeNDAP • Client-server protocol for scientific data access • C++ client and server, Java client and server libraries. • Current version 2.0; NASA ESE standard • Working on new 4.0 protocol spec
THREDDS • Originally funded by NSDL • “discovery and use of scientific data” • Middleware between data providers and users • Dataset Inventory Catalogs (XML) • Now part of Unidata core funding • Data Serving (pull)
What’s a Data Model? • Its about scientific data: storing, accessing • It’s an abstraction • Equivalent to an abstract object model in OOP • An Abstract Data Model describes data objects and what methods you can use on them
An API is the interface to the Data Model for a specific programming language A file format is a way to persist the objects in the Data Model. A data access protocol plays the role of a file format. The Abstract Data Model removes the details of any particular API and the persistence format. What’s a Data Model?
Creating a Common Data Access Model from NetCDF, HDF5, OPeNDAP
NetCDF-3 Data Model
HDF5 Data Model
Scientific Datatypes Point Trajectory Station Radial Grid Swath Common Data Model Layers Coordinate Systems Data Access
Coordinate Systems needed • NetCDF, OPeNDAP, HDF data models do not have integrated coordinate systems • so georeferencing not part of API • Need conventions to specify (eg CF-1, COARDS, etc) • Contrast GRIB, HDF-EOS, other specialized formats • Must be done in a general way
Coordinate Systems • Same underlying mathematics as VisAD, ASCII
Scientific DataTypes • Based on datasets Unidata is familiar with • APIs are evolving • How are data points connected? • Intended to scale to large, multifile collections • Intended to support “specialized queries” • Space, Time • Corresponding “standard” NetCDF file conventions
PointObsDataset Methods // Collection of StructureData Collection getData( LatLonRect boundingBox, Date start, Date end);
TrajectoryObs Methods int getNumPoints(); StructureData getData(int point);
StationObs Methods // return List of Station List getStations(); // return List of StructureData List getData( Station s, Date start, Date end);
Radial methods interface Radial { int getNumGates(); float getData(int gate); float getStartingGate(); float getGateSize(); float getElevation(); float getAzimuth(); double getTime(); }
Grid methods interface GridCoordSys { CoordinateAxis getTaxis(); CoordinateAxis getXaxis(); CoordinateAxis getYaxis(); CoordinateAxis getZaxis(); Projection getProjection(); } Array getDataCube(Range time, Range z, Range y, Range x);
Standardizing NetCDF Formats • Grid: CF-1 Convention • Need improvements for regional models (WRF), GIS info • Radar: “Radar Exchange Format” • With radar community (led by NCAR ATD) • Point Observations • Unidata Observation Dataset Conventions
netCDF-3 Interface netCDF-4 Library HDF5 Library NetCDF-4 C Library NetCDF-4 C Library
NetCDF-4 Status • 4.0 Beta implements CDM access layer • complete, but waiting for HDF5 release 1.8 to finalize file format • 4.1: adding Coordinate Systems • 4.?: merge OPeNDAP access (pending funding)
NetCDF-Java 2.2 (nj22) • Prototype implementation of CDM • File formats: • General: NetCDF, HDF5, OPeNDAP • Grids: GRIB1, GRIB2 • Radar: NEXRAD, NIDS, DORADE • Satellite: DMSP, GINI • Access to THREDDS catalogs • Implements NcML
Scientific Datatypes Point Trajectory Station Radial Grid Swath Common Data Model Coordinate Systems Data Access
THREDDS Catalog.xml Application Scientific Datatypes Datatype Adapter NetCDF-Java version 2.2 architecture NetcdfDataset CoordSystem Builder NetcdfFile ADDE I/O service provider OPeNDAP NetCDF-3 NIDS NetCDF-4 GRIB HDF5 GINI Nexrad DSMP …
NetCDF-Java 2.2 Status • Data Access layer: Beta quality • also waiting for HDF5 release to finish NetCDF-4, commit to API • Coordinate Systems: early Beta • Finishing docs, runtime plugability • Data Types: Alpha, still experimenting with APIs
NetCDF Markup Language (NcML) • XML representation of netCDF metadata (like ncdump -h) • Create new netCDF files (like ncgen) • Modify existing datasets • Add/delete/rename • Create logical sections of existing variables. • Create unions and aggregations of multiple existing datasets.
NcML example <?xml version="1.0" encoding="UTF-8"?> <netcdf xmlns="http://www.unidata.ucar.edu/schemas/netcdf/ncml-2.2" location=“/data/nids/N0R_20041119_2147"> <attribute name=“DataType" value=“Radar" /> <remove type=“attribute” name=“password" /> <variable name="Reflectivity" orgName=“R34768”> <attribute name="units" value=“dBZ" /> </variable> </netcdf>
= + + + = NcML Aggregation • Union • Join Existing • Join New • Forecast Model Run
NcML Aggregation Example <netcdf xmlns=“http://www.unidata.ucar.edu/schemas/netcdf/ncml-2.2”> <aggregation dimName="time" type="joinNew"> <variableAgg name="Temperature"/> <variableAgg name="Pressure"/> <scan location=“C:/data/goes/" suffix=".gini"/> </aggregation> </netcdf>
THREDDS Data Server • Integrates data access with THREDDS catalogs and services • Tomcat/Servlet, 100% Java, single war file • Data input is netCDF Java 2.2 library • Data output: • OPeNDAP • HTTP Server • OGC Web Coverage Server (gridded)
THREDDS Data Server HTTP Tomcat Server Catalog.xml Application THREDDS Server • OPeNDAP • HTTPServer • WCS NetCDF-Java library hostname.edu Datasets IDD Data
TDS as WCS Gateway hostname.edu HTTP Tomcat Server Catalog.xml Application THREDDS Server • OPeNDAP • HTTPServer • WCS NetCDF-Java library OPeNDAP Server anotherHost.org
NcML TDS and NcML hostname.edu HTTP Tomcat Server Application THREDDS Server • OPeNDAP • WCS Netcdf-Java Catalog.xml Datasets
TDS and NcML • Server serves the dataset “wrapped” by the NcML • Client sees OPeNDAP or WCS, not NcML • Can “fix” metadata problems • Can augment metadata • Use NcML aggregation on the TDS • replaces the old “Aggregation Server”
OAI Harvester DL Records TDS and Digital Libraries HTTP Tomcat Server Catalog.xml Application THREDDS Server • OPeNDAP • HTTPServer NetCDF-Java library • WCS Datasets hostname.edu otherhost.gov OPeNDAP Server
TDS and Digital Libraries • Framework to add metadata • By hand (collection level) • Automatic extraction from datasets • Send records to existing DLs • No search • Both collection and inventory level