120 likes | 248 Views
Unidata Infrastructure for Data Services. Russ Rew GO-ESSP Workshop, LLNL 2006-06-19. Some Current Unidata Infrastructure Projects. LDM for distributing and processing near real-time data
E N D
Unidata Infrastructure for Data Services • Russ Rew • GO-ESSP Workshop, LLNL • 2006-06-19
Some Current Unidata Infrastructure Projects • LDM for distributing and processing near real-time data • Integrated Data Viewer (IDV) for testing infrastructure in platform-independent data visualization and analysis • NetCDF C-based interfaces for data access • CFIOlib for a CF conventions API (tomorrow) • NetCDF Java for advanced data access infrastructure • Common Data Model for improving interoperability • NcML for metadata annotation and data aggregation • THREDDS Data Server (TDS) for remote access to archives • GALEON for serving netCDF data through OGC Web Coverage Services (WCS)
LDM-6 for Internet Data Distribution • Implements a peer-to-peer system for reliable, event-driven data distribution • Supports subscriptions to many near real-time data feeds; no data center needed • Data product abstraction is general: model output, observations, text products, satellite data, radar, … • Protocols use persistent connections to achieve low latency • Highly configurable: inject, distribute, capture, filter, and process arbitrary data products • In continuous use by over 160 universities, NOAA, USGS, NASA, internationally, THORPEX global ensembles (TIGGE), … • Candidate for use in new WMO weather information system
IDV (Integrated Data Viewer) • Freely available 100% Java reference application and framework for visualization and analysis of geoscience data • Provides integrated and time synchronized 2-D and 3-D visualizations of model outputs, observed, and remotely sensed data, using U. of Wisc. VisAD • Handles diverse formats and protocols for local and remote access: GRIB, netCDF, OPeNDAP, ADDE, HTTP, GIS, … • Serves as end-to-end test for many Unidata technologies: THREDDS services, Java netCDF, XML bundles, plug-in architecture, interactive collaboration, …
NetCDF’s Niche • Simple data model for scientific datasets • Portable, self-describing data • Appendable, sharable, archivable • Direct access for efficient subsetting • Metadata via attribute conventions such as CF • Flexible remote access via OPeNDAP, HTTP, WCS • Lots of applications: NCO, ncbrowse, ncview, IDV, IDL, MATLAB, ArcGIS, ... • Language interfaces include C, Java, Fortran, C++, Perl, Python, Ruby, ...
NetCDF-3 Data Model File location: Filename create( ), open( ), … DataType char byte short int float double Attribute name: String type: DataType values: 1D array Dimension name: String length: int isUnlimited( ) Variable name: String shape: Dimension[ ] type: DataType array: read( ), … Variables and attributes have one of six primitive data types. A file has named variables, dimensions, and attributes. Variables also have attributes. Variables may share dimensions, indicating a common grid. One dimension may be of unlimited length.
Some NetCDF-3 Limitations • Only one shared unlimited dimension • No structures, just scalars and multidimensional arrays • No strings, just arrays of characters • Limited numeric types • No ragged arrays or nested structures • Only ASCII characters in names • Changes to file schema can be expensive • Efficient access requires reads in same order as writes • No built-in compression • Only serial I/O • Flat name space limits scalability
NetCDF-4 Features to Address Limitations • Multiple unlimited dimensions • Portable structured types • String type • Additional numeric types • Variable-length types for ragged arrays • Unicode names • Efficient dynamic schema changes • Multidimensional tiling (chunking) • Per variable compression • Parallel I/O • Nested scopes using Groups
Variable name: String shape: Dimension[ ] type: DataType array: read( ), … File location: Filename create( ), open( ), … PrimitiveType char byte short int int64 float double unsigned byte unsigned short unsigned int unsigned int64 string UserDefinedType typename: String Attribute name: String type: DataType values: 1D array Enum Opaque Compound VariableLength Group name: String Dimension name: String length: int isUnlimited( ) NetCDF-4 Data Model (Common Data Access Model) DataType Variables and attributes have one of twelve primitive data types or one of four user-defined types. A file has a top-level unnamed group. Each group may contain one or more named subgroups, variables, dimensions, and attributes. Variables also have attributes. Variables may share dimensions, indicating a common grid. One or more dimensions may be of unlimited length.
NetCDF-4 Architecture NetCDF Java applications NetCDF-3 applications NetCDF-4 applications HDF5 applications • NetCDF-4 uses HDF5 for storage, high performance • Parallel I/O • Chunking for efficient access in different orders, efficient use of compression • Conversion using “reader makes right” approach • Provides simple netCDF interface to subset of HDF5 • Also supports netCDF classic and 64-bit formats NetCDF Java application NetCDF-3 application NetCDF-4 application HDF5 application netCDF Java netCDF-4 netCDF-3 HDF5 Java VM POSIX I/O MPI I/O …
Status of NetCDF-4 • NetCDF-4.0-alpha14 currently available for testing • Files created with alpha release use unsupported artifacts • We’re seeking feedback on performance and functionality • NetCDF-4.0-beta waiting for HDF5 1.8-beta • Will finalize file format, eliminate necessity for artifacts • Expected within a few weeks of HDF5 1.8-beta release, maybe by August 2006 • HDF5 1.8 currently expected by November 2006 • Has enhancements specifically for netCDF-4: variable creation order, Unicode names, dimension scales, on-the-fly numeric conversions • Plans for netCDF-4.1 and beyond on netCDF-4 web site
Summary • Unidata’s LDM-6 implements an event-driven architecture for low-latency data distribution • Unidata’s IDV provides a platform-independent visualization and analysis framework and reference application for integrating data from diverse sources • Unidata’s netCDF-4 software preserves backward compatibility and eliminates many limitations of netCDF-3 with only a modest increase in complexity