160 likes | 260 Views
Easily Serving and Accessing HDF-EOS2 Datasets Using DODS Technologies. Richard Chinman, UCAR-IITA, DODS Project Manager. http:// www.unidata.ucar.edu/packages/dods. Imagine transforming your favorite data analysis and visualization tool: into a network-savvy client
E N D
Easily Serving and Accessing HDF-EOS2 Datasets Using DODS Technologies Richard Chinman, UCAR-IITA, DODS Project Manager http:// www.unidata.ucar.edu/packages/dods
Imagine transforming your favorite data analysis and visualization tool: • into a network-savvy client • that can make a data request • in the form of a fully constrained URL • sent out over http • received and evaluated (e.g., subsetted) by a local or remote data archive/server • that stores the data in “native” format • and the data delivered directly into the client • in the format that the client expects • That’s what DODS does (to a certain extent)!
DODS is a software package that helps users provide and access data over the net in a consistent fashion DODS is a highly distributed system due to the two fundamental considerations that have gone into the design of DODS: data are often most appropriately distributed by the individual or group that has developed them the user will in general like to access data from the application software with which s/he is most familiar.
The DODS tools for developing network-savvy versions of popular data access APIs and data analysis packages extend the scope of an application's search for data. A DODS-enabled application can: • Get any data anywhere on the Internet that is served by a DODS server • Use data from any DODS server, pretty much regardless of its native format • Still perform all its original functions for accessing data locally
Freely Available DODS Software The source code and executables for UNIX platforms are available from: http://www.unidata.ucar.edu/packages/dods/ Windows port of the software is JUST finished Macintosh port is “soon”
DODS Server Technologies CEDAR DSP (FITS) FreeForm (GRIB) HDF-EOS2 (HDF5) JGOFS Matlab netCDF A DODS server is a Web server with a set of CGI scripts that are specific to the format of the dataset it serves. When the server receives a URL that corresponds to a script in the DODS server's cgi-bin directory, the server executes the script. A typical DODS script fetches a selection of data from the dataset, converts it to a binary format, packages it with some descriptive information, and sends it to the client. A DODS server must have read access to the datasets it serves, and to the DODS scripts. Setting up a DODS server is not much harder than setting up a normal Web server. Using a secure Web server secures the data in DODS as well.
HDF and DODS The HDF-EOS2 DODS server is accessible now by any DODS-enabled client When the HDF5 DODS server development is finished, all DODS clients will be able to access HDF-EOS2 and HDF5 datasets
netCDF community of clients Ferret LAS GrADS nco ncview IDL community of clients IDL “LAS-lite” Matlab DODS Client Technologies There is no set appearance or functionality for a DODS client; it can be implemented in a variety of ways, and perform any functions that its users require. The DODS client uses DODS functions to request data from the DODS server, and to interpret the results received from the server into a particular data format. The data request functions use the http protocol, sending an enhanced URL to the server. The data interpretation functions translate the data sent by the server into the data format expected by the rest of the application. Because the expected data formats vary, there are different kinds of DODS clients; for example, a JGOFS DODS client furnishes its data in JGOFS format. You can create a DODS client by relinking an existing application, or by writing a new one. To be eligible for conversion to a DODS client, an application must make use of one of the data access APIs that DODS supports.
The DODS software is composed of a core library, and a variety of libraries that each support a different data access API. The DODS core is a set of C++ classes for building DODS servers and DODS clients. The individual libraries for each data access API specialize these classes into a set of data-handling functions, specific to that data access API. When you relink an existing application with the DODS libraries, you are essentially replacing the application's data-handling functions with same-named ones in the DODS libraries. The calling application is unaware that the data access API has changed, since the function names haven't changed and it still gets its data in the format it expects. From the user's point of view, however, the application suddenly has access to datasets in remote locations. If you choose to write a new application, you just use DODS data access functions instead of the data access API's functions. Your choice of API is flexible, since DODS includes a variety of libraries for linking data access APIs. DODS Functions
The DAS and DDS Objects The dataset attribute structure (DAS) and dataset descriptor structure (DDS) objects are used to store information about a dataset's variables. These objects are used on both the client and server sides, although there are class features that only pertain to one or another of the roles. They can be thought of as metadata objects. In this documentation (http://www.unidata.ucar.edu/packages/dods/api/pguide-html/), however, we will avoid the term metadata because often this is data to many users. It might be said that neither the DAS nor the DDS contain actual science data -- the DAS contains attribute information from the dataset while the DDS contains structural information about the dataset and variables in the data set. Since the boundary between data and metadata (or data attributes) is often a blurry one, this is not a distinction we will insist on. To build both the DAS and DDS, the server either reads information directly from the dataset or from DODS-specific ancillary data files, depending on the capabilities of the data access API used to access the data. The DAS and DDS server filter programs do this and then transmit the resulting object to the client. On the client side, the DODS client uses information in the DAS and DDS to satisfy API calls issued by the user program requesting information about variables, their type, shape, and attributes. The client requests both of these objects when it first contacts the remote dataset. The DAS and DDS objects are then stored as part of a virtual connection to that dataset and can be used repeatedly by the client library without retransmission. The DAS and DDS objects have both an internal and an external representation. Internal to the DODS client or server, these structures are stored as C++ objects, while their external representation is as text. The object is sent from the server to the client using this text representation. Each of the two classes contains a parser which can read the text representation and recreate the object's internal representation. In addition, it is possible to write the text representation for either object (using a text editor) and then use the parser to create the internal, C++, object. Furthermore, the text representation is a type of persistence and can be used to build a flexible object caching mechanism.
Communicating Dataset Structure and Attributes (1) In order to translate from the user program's API to the data set's API, the translator process must have some knowledge about the types of the variables, and their semantics, that comprise the dataset. It must also know something about the relations of those variables--even those relations which are only implicit in the data set's own API. This knowledge about the dataset's structure is contained in a text description of the dataset called the Dataset Description Structure. The dataset description structure (DDS) does not describe how the information in the dataset is physically stored, nor does it describe how the dataset's API is used to access that data. Those pieces of information are contained in the dataset's API and in the translating server, respectively. The translating server uses the DDS to describe the structure of a particular dataset to a translator--the DDS contains knowledge about the dataset variables and the interrelations of those variables. In addition, the DDS can be used to satisfy some of the DODS supported APIs dataset description calls. For example, netCDF has a function which returns the names of all the variables in a netCDF data file. The DDS can be used to get that information.
Communicating Dataset Structure and Attributes (2) The Dataset Attribute Structure (DAS) is used to store attributes for variables in the dataset. An attribute is any piece of information about a variable that the creator wants to bind with that variable excluding the characteristics type, shape, and units. The characteristic type, shape and units are always defined for every variable; they are data type information about the variable. Attributes, on the other hand, are intended to store extra information about the data such as a paragraph describing how it was collected or processed. In principle attributes are not processed by software other than to be displayed. However, many systems rely on attributes to store extra information that is necessary to perform certain manipulations on data. In effect, attributes are used to store information that is used “by convention” rather than “by design”. DODS can effectively support these conventions by passing the attributes from dataset to user program via the DAS. Of course, DODS cannot enforce conventions in datasets where they were not followed in the first place.
LAS is GUI interface Ferret easily serves N-dimensional datasets via http to web browsers regardless of the native format of the datasets large number of analysis/delivery options with Ferret engine “LAS-lite” is GUI interface IDL easily serves N-file datasets via http to web browsers regardless of the native format of the datasets highly adaptable/configurable GUI with IDL engine Two DODS-enabled “Web” applications
(Easily) Serving and Accessing HDF-EOS Data Installing a DODS HDF-EOS server takes about an hour. As soon as the server is installed then “LAS-lite”, for example, is available as a highly configurable (tune-able to your specific community’s needs) web page generator. All users of DODS-enabled clients can also make data requests and receive your data. An HDF-EOS server makes your data available (and subsettable) over the web to all DODS clients, though none of those clients are HDF clients. One scenario…Many (most?) swath datasets are stored as HDF files, few point and grid datasets are stored in HDF format. There may be (great?) benefits for users of HDF clients to read data formats in which point and grid data are stored. Development of an HDF/HDF-EOS DODS client library would give this capability to HDF applications.
Real-time network demonstration of DODS technologies Short demo now: NOAA PMEL Live Access Server (LAS) Day 3, Thursday, 21 Sep: Matlab GUI IDL client inside LAS-lite More LAS