250 likes | 356 Views
The OPeNDAP project at HAO. Applications in solar and solar terrestrial physics and technology development for HPC applications Peter Fox (HAO/NCAR). HAO and interdisciplinary data. Coupled Energetic and Dynamics of Atmospheric Regions (CEDAR - NSF) Community of over 300 users
E N D
The OPeNDAP project at HAO Applications in solar and solar terrestrial physics and technology development for HPC applications Peter Fox (HAO/NCAR)
HAO and interdisciplinary data • Coupled Energetic and Dynamics of Atmospheric Regions (CEDAR - NSF) • Community of over 300 users • Heterogeneous datasets dating back to 1966, at NCAR since 1983 • Over 1000 database files between 1MB and 140MB • Millions of records • 13 different instrument classes • Includes models and indices • Stored in a record style format using Cray™ blocking • Multiple data types stored in alternating records • No ‘independent’ variable
Looking for a solution ~ 1996 • Required an end-to-end system • Deliver datasets to end-user application • Adopted DODS • Had to implement a catalog and interface • Two new data formats supported, output filters, server side functions • Handle larger files • Security - authentication
CEDARWEB 3.X ~ 2001 • IDL On the Net (ION) script • Integrated navigation • Interactive plotting and data retrieval • OPeNDAP URLs are fundamental to these operations • To get data and by the plotting programs
Interoperability • the ability to exchange content between non-identical computer systems, usually achieved by definition of a common interface or protocol which is fully, or mostly, independent of the content. • Examples: ability to exchange email between Unix (tm) and Windows (tm) systems achieved using SMTP. • Requires support of the protocol • True/full interoperability in a distributed environment (n-tier) requires attention to housekeeping beyond the content (authorization, resource management, sessions, graceful failure, etc.)
Simple and easy to install One CGI process per URL request Limited memory management – external Limited scalability Limited status reporting to web server Returns data stream from one format Standalone server or httpd module Can manage multiple daemon processes Strong memory management – internal Reuse processes, scales Coupled to OPeNDAP server for status Returns multiple formats in a single stream, multiple protocols DODS vs OPeNDAP Architecture
URLs • http://www-server/cgi-bin/nph-fits/data/image19990101.fits.gz • http://www-server/cgi-bin/nph-fits/data/image.fits?date("1999/01/01") • http://www-server/cgi-bin/nph-fits/data/image.fits?image[200:800][400:600] • http://www-server/opendap/images/?date("1999/01/01","1999/01/31")&helio_grid("-70$<$lat$<$-30", "110$<$lon$<$180") • One function of OPeNDAP clients is to help the user build and manage these URLs
OPeNDAP • DODS since ~ 1995 has been based on http and cgi-style architecture • Two concerns • Application support and performance of HTTP • Housekeeping abilities of cgi architecture • Now evolving OPeNDAP the discipline neutral aspect of DODS
OPeNDAP ctd. • Data transport protocol and access protocol separated • Revise server architecture • Address authentication • Memory management • Exception handling • All these changes and retain interoperation with HTTP and cgi • Advanced requirements: URL should support more than one dataset, or object.
Tasks/ status • Refactor core classes to remove http/libwww, etc. • Have b release of standalone OPeNDAP server (no dependence on web server) • Simple command line client • Multi-protocol support: file, http, now GridFTPftp, GridFTP, ... • Run OPeNDAP server as a client to GridFTP server • First application client is likely to be IDL • Authentication is handled outside OPeNDAP server (CAS for GridFTP) • OPeNDAP URLs are evolving
Server Architecture • within a web server - servlet, or standalone • better load balancing • better scaling • better process creation and management • multi-file containers (any format, any server) • retain backward compatibility • client modifications for some of the above • new OPeNDAP URLs and semantics • been in operation for CEDAR for about one year
Interoperability in action • Thermosphere-Ionosphere Mesosphere Energetics and Dynamics (TIMED) • netCDF data format • send them a Data Product Form and Product Availability Notice with the CEDAR URLs
Collaboratories • Space Physics and Aeronomy Research Collaboratory (SPARC - NSF) (http://intel.si.umich.edu/sparc) • Worktools (http://worktools.si.umich.edu) • Even more dataset diversity and 1000's of registered users • Strong integrated catalog requirement - using our catalog as-is • Ingests data using CEDAR URLs. • Future: further integration using the OPeNDAP java server/servlet
Solar Physics • Non-georeferenced (mostly) datasets, images (mainly solar) • Support for Flexible Image Transport System (FITS), and CDF • General server-side functions for data selection • Many have specific catalog requirements
NOTE • None of these applications ‘know’ they are using OPeNDAP!
Concluding remarks • OPeNDAP infrastructure ready for wide deployment • Data format support: netCDF, HDF4, HDF5, HDFEOS, Tables, Matlab, SQL, DSP, flat binary, CEDAR, FITS, CDF • Application support: IDL, Matlab, Excel, netCDF library calls, Ferret, GrADS, ODV, web browser, ncBrowse, VisAD
More concluding remarks • User support via Unidata • Questions?