230 likes | 396 Views
Enabling technologies for facilitating access and use of data. Russ Rew and John Caron, Unidata Workshop on Ensuring Access and Trustworthiness of Climate Observations and Models for Society NCDC, Asheville , 2010-03-09. CDM. Goal: N + M instead of N * M things on your TODO List.
E N D
Enabling technologies for facilitating access and use of data Russ Rew and John Caron, Unidata Workshop on Ensuring Access and Trustworthiness of Climate Observations and Models for Society NCDC, Asheville, 2010-03-09
CDM Goal: N + M instead of N * M things on your TODO List File Format #1 Visualization &Analysis NetCDF file File Format #2 Data Server File Format #N Web Service
Common Data Model • What is it? • Capabilities for observational data • Current status
What is it? • Abstract Data Model for scientific data • Implemented by Netcdf-Java library • Core of the THREDDS Data Server • Co-evolving with the CF Conventions
Abstract Data Modelaka Object Model • Data Access Layer • NetCDF / HDF5 / OPeNDAP • subset in index space • Coordinate System Layer • CF, VisAD, HDF-EOS, GRIB • georeferencing • Feature Type Layer • OGC WxS, ISO, CSML, • Subset in coordinate space
Abstract Data Model • Turns a collection of bytes into a collection of objects called features • Eg: Grids, swaths, profiles, radial sweeps • These objects play the same role as a schema does in a database • Defines the things (nouns) and what operations (verbs) are possible
Netcdf-Java library implementation • 100 % pure Java, open source, developed and maintained by Unidata • Object oriented, strongly typed, garbage collected, huge open-source libraries, runtime configurable == highly productive • Many different file formats • Many different coordinate system conventions • Library is used by many other software packages
Netcdf-Java File Formats • General: NetCDF-3, NetCDF-4, HDF5, HDF4, OPeNDAP • Gridded: GRIB-1, GRIB-2, GEMPAK, McIDAS, UAMIV CAMx • Point: BUFR, GEMPAK • Radar: NEXRAD 2&3, DORADE, CINRAD, UF • Satellite: DMSP, GINI, McIDAS, FYSAT, HDF-EOS • Misc: GTOPO, NLDN, USPLN, etc • Write your own IOServiceProvider Java class
Transforms (CF) Projections albers_conical_equal_area, lambert_azimuthal_equal_area, lambert_conformal_conic, mcidas_area, mercator, orthographic, rotated_pole , stereographic (including polar), transverse_mercator, UTM (ellipsoidal), vertical_perspective Vertical Transforms atmosphere_sigma, atmosphere_hybrid_sigma_pressure, ocean_s, ocean_sigma, existing3DField Write your own CoordTransBuilderIF Java class
Used by other applications Integrated Data Viewer, ToolsUI (Unidata) Panoply (NASA) ncBrowse (EPIC/NOAA) Java NEXRAD Viewer (NCDC/NOAA) MyWorld GIS (Northwestern) EDC for ArcGIS, ERRDAP (SFSC/NOAA) Live Access Server (PMEL/NOAA) ncWMS (Reading) Matlab plug-in (USGS)
Core of the THREDDS Data Server Servlet Container catalog.xml Remote Access Client THREDDS Server • WCS • OPeNDAP • HTTPServer • WMS NetCDF-Java library configCatalog.xml Datasets IDD Data motherlode.ucar.edu
THREDDS Data Server (TDS) Web server for scientific data 100% Java - servlet Provides remote data access OPeNDAP Open Geospatial Consortium (OGC) WMS and WCS HTTP file transfer Experimental data access protocols. Infrastructure – not a portal
TDS and NcML • Embed NcML into the TDS configuration catalog • Server serves a virtual dataset defined by NcML • NcML hidden from the client • Can “fix” metadata problems • Can augment metadata • General Aggregations • joinNew, joinExisting, Union • Specialized Aggregations • Forecast Model Run Collection (FMRC) • Point Feature Collections (version 4.2)
TDS / NcMLModify all files in datasetScan <datasetScan name="Ocean Satellite Data" path="/data/ocean/sat/" location= "/data/ncdc/impacts/scenario4b/run1234"> <netcdf> <attribute name=“NCS:Provenence" value=“NCDC assimilation prog4gd from GOES-10"/> </netcdf> </datasetScan>
TDS / NcML aggregation <dataset name="WEST-CONUS_4km Aggregation" urlPath="satellite/3.9/WEST-CONUS_4km"> <netcdfxmlns="http://www.unidata.ucar.edu/schemas/netcdf/ncml-2.2" <aggregation dimName="time" type="joinNew"> <scan location="/data/satellite/WEST-CONUS_4km/" suffix=".gini" /> </aggregation> </netcdf> </dataset>
Co-evolving with the CF Conventions • Implementation of the CF Conventions • Strong feedback (in both directions) between CF and CDM • CF is the recommended way to write datasets • CDM also deals with legacy datasets and other file formats besides netCDF
CF • CF has mostly focused on model gridded data • Driven by IPCC work • Has a general coordinate system model • :coordinates = “lat lon alt time”; • Sufficient for swath, some in-situ data • Current efforts • Radial data (NCAR/EOL) • Discrete Sample data (aka point, in-situ data)
Discrete Sample Data Categorization • Point: measured at one point in time and space • Station: time-series of points at the same location • Profile: points along a vertical line • TimeSeries of Profiles a time-series of profilesat same location. • Trajectory: points along a 1D curve in time/space • Trajectory of Profiles: a collection of profilefeatures which originate along a trajectory.
Proposed Encoding Variations • Rectangular Array • Multidimensional • Single : one feature in the file • Ragged Array – different length features • Contiguous • Non-Contiguous • Flattened
Current CDM Status • Discrete Sample Data proposal • Almost finalized (Caron/Gregory/Hankin) • CDM implementation now in 4.1 • Collections of files to be in 4.2 • Forecast Model Run Collection refactor • Also using Collection • Caching on the server • Scale to much larger collections (NCDC/Nomads) • Scheduled for 4.2
CDM funding status • CDM/THREDDS work competes with many other priorities at Unidata • THREDDS is most used by large data centers (NOAA/NASA/USGS/EPS, EU) • Important (but indirect) benefits to NSF ATM constituency (US academic meteorology) • Unidata is fully committed but not much chance of expanded base funding from NSF
CDM funding status (cont) • Have a proposal in to NSF Cyber-Infrastructure solicitation • Integration of TDS and IDD/LDM data streams • Explore use of Hadoop (Map/Reduce) for very large collections • Need commitment of resource from you • ($$) Custom work when compatible • In-kind contribution == time and attention for CF/CDM from domain experts and engineers