330 likes | 511 Views
OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data. Yuan Ho Ethan Davis UCAR Unidata. Access and Discovery of Distributed Scientific Data. OPeNDAP – access to scientific data but no standard inventory or discovery mechanisms
E N D
OPeNDAP and THREDDS:Access and Discovery of Distributed Scientific DataOPeNDAP and THREDDS:Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata
Access and Discovery of Distributed Scientific Data • OPeNDAP – access to scientific data but no standard inventory or discovery mechanisms • THREDDS – cataloging, describing, and discovery of scientific data
What is OPeNDAP • OPeNDAP (Open source Project for a Network Data Access Protocol) is a protocol for accessing distributed scientific data (aka DODS DAP). • OPeNDAP is a generic data exchange mechanism that lies at the core of a variety of discipline data system. • OPeNDAP is two reference implementations of the protocol (C++ and Java) • OPeNDAP is a software framework that simplifies all aspects of scientific data networking, allowing simple access to remote data. • OPeNDAP is a community of users and developers • OPeNDAP is a non-profit corporation called OPeNDAP Inc..
Design Principles • The user should be able to share their data via OPeNDAP over network (server). • The user should be able to use their application package to examine or analyze the data of interest (client).
Data access (client) Access to remote data in users normal application IDL (win32) Matlab Ferret GrADS Any netCDF application Excel Don’t need to know the data format in which the data is stored Can access data subsets. Data publishing (server) Network interface via http DAP provides common/network representation for data Can serve data in various formats netCDF HDF SQL FreeForm JGOFS DSP Allows subsetting of data Client/Server Interaction
OPeNDAP Status • OPeNDAP/DODS 3.4 release • OPeNDAP Java 1.1.3 • OPeNADP Data Connector 2.3X • OPeNDAP DAP Specification 4.0
OPeNDAP Data Object • Three important OPeNDAP data objects: • DDX • The DDX is an XML representation of the structure of all or part of a data set, as well as a description of the variables within that datasets. • Blob • Binary data transfer from the data source to the client. The Blob contains the serialized data represented by the DDX. • ErrorX • The ErrorX object is an XML document containing information about any errors that may have been encountered by the server while processing a request.
DDX Example • DDX Example <Datasets name=“fnoc1.nc” xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance xmlns=http://www.opendap.org/ns/OPeNDAP xsi:schemaLocation=“http://www.opendap.org/ns/OPeNDAP http://dods.coas.oregonstate.edu:8080/opendap/opendap.xsd”> <Attribute name=“Description” type=“String”> <value>Fleet Numerical Wind Data</value> </Attribute> <Array name=“u”> <Attribute name=“long_name” type=“String”> <value>U_Wind_Vector</value> </Attribute> <Float32/> <dimension size=“16” name=“latitude”> <dimension size=“17” name=“longitude”> <dimension size=“21” name=“time”> </Array> <Blob URL=“http://dcz.opendap.org/dap/data/nc/fnoc1.nc?u”/> </Dataset>
Variables and Attributes • Each variable consists of a name, a type, a value and a collection of Attributes. • Atomic variables: atomic data types are indivisible. • integer, floating-point, string, and binary images. • Example <Float64 name=“Depth”/> <Binary name=“sound_sample” size=“17623”/> • Constructor variables: a constructor variable is assembled from collections of other variables, including both atomic and constructor types. • array, structure, grid, and sequence. • Example <Array name=“temp”> <Byte/> <dimension size=“5” name=“lon”/> <dimension size=“3” name=“lat”/> </Array>
Variables and Attributes • An attributes is composed of a name, a type, and a value. • Each variable may have zero or more attributes. • Types: Boolean, Byte, IntXX, UIntXX, FloatXX, String, URL. • Example <Dataset name=“test”> <Structure name=“measurement”> <Attribute name=“data” type=“String”> <value> 18 Mar 03</value> </Attribute> <Attribute name=“other” type=“Structure”> <Attribute name=“satellite_name” type=“String”> <value>GOES</value> <Attribute name=“experiment number” type=“int32”> <value>898976</value> </Attribute> </Attribute> <Float64 name=“value”> <Array name=“time_series”> <dimension size=“32”> </Array> </Structure> </Dataset>
Requests/Responses Responses: four categories of information pass from the server to client • Information about the data: DDX • The data: Blob • Error messages: ErrorX object • Information about the server: version messages and server capabilities document Requests: a constraint expression provides a way for client to request certain information from a dataset, such certain variables, or parts of certain variables. • Projection clause: a collection of one or more project elements • Selection clause: one or more select elements. • Example: <Constraint> <Project variable=“/sample/temp”/> <Project variable=“/sample/salt”/> <Select condition=“/sample/salt>34.0” target=“sample”/> </Constraint>
Problems of searching and retrieving datasets from OPeNDAP server • Metadata • Use metadata: metadata at the data level • Search metadata: metadata at the directory level • OPeNDAP has been built from data level, high functionality at the data acquisition level. • OPeNDAP AIS (ancillary information service) adding metadata information into OPeNDAP data stream. The role of ancillary data is to translate and access of data • ODC is more a directory services with limit data searching functionality.
Summary of OPeNDAP • OPeNDAP data delivery architecture provides remote access of data via internat. • OPeNDAP uses HTTP (FTP, GridFTP, Telnet, et cetera)to transport its data object. • OPeNDAP has proved very versatile. • XML for the persistent form of the data objects. • OPeNDAP is a data access tool, need a data discovery tool to complement each other.
THREDDS Project • Develop a framework to bridge the gap between data providers and data users, to make scientific data discoverable and usable as well as referencable from scientific publications and educational materials. • The framework should be: • Scalable for large and small projects • Easy to use yet powerful and flexible • Capable of supporting various user interfaces
THREDDS Catalogs • Hierarchal structure of datasets • Dataset access methods • Structure on which to hang (reference) metadata THREDDS catalogs are for communicating information about a collection of datasets 1 0..* 0..* 0..* 0..*
THREDDS Catalogs • Hierarchal structure of datasets • Dataset access methods • Structure on which to hang (reference) metadata THREDDS catalogs are for communicating information about a collection of datasets 1 0..* 0..* 0..* 0..*
THREDDS Catalogs <catalog version="0.6"> <dataset name="Unidata IDD Model Data"> <dataset name="NCEP Eta 80km CONUS model data"> <metadata metadataType="DublinCore" xlink:href="http://server/dods/eta.xml" /> <dataset name="NCEP Eta 80km CONUS 2003-09-24 12Z"> <access serviceType="DODS" urlPath="http://server/dods/2003092412_eta.nc" /> </dataset> …
THREDDS Catalogs • Hierarchal structure of datasets • Dataset access methods • Structure on which to hang (reference) metadata THREDDS catalogs are for communicating information about a collection of datasets 1 0..* 0..* 0..* 0..*
THREDDS Catalogs <catalog version="0.6"> <dataset name="Unidata IDD Model Data"> <dataset name="NCEP Eta 80km CONUS model data"> <metadata metadataType="DublinCore" xlink:href="http://server/dods/eta.xml" /> <dataset name="NCEP Eta 80km CONUS 2003-09-24 12Z"> <access serviceType="DODS" urlPath="http://server/dods/2003092412_eta.nc" /> </dataset> …
THREDDS Catalogs • Hierarchal structure of datasets • Dataset access methods • Structure on which to hang (reference) metadata THREDDS catalogs are for communicating information about a collection of datasets 1 0..* 0..* 0..* 0..*
THREDDS Catalogs <catalog version="0.6"> <dataset name="Unidata IDD Model Data"> <dataset name="NCEP Eta 80km CONUS model data"> <metadata metadataType="DublinCore" xlink:href="http://server/dods/eta.xml" /> <dataset name="NCEP Eta 80km CONUS 2003-09-24 12Z"> <access serviceType="DODS" urlPath="http://server/dods/2003092412_eta.nc" /> </dataset> …
THREDDS Catalogs • <dc:title>NCEP Eta 80km CONUS model data</dc:title> <dc:creator>NOAA/NCEP</dc:creator> • <dc:subject>NCEP Eta Model data; Real-time data</dc:subject> • <dc:description> • This collection of real-time NOAA/NCEP Eta model data contains five days worth of data. The data is on a 80km CONUS grid (GRIB grid 211). Daily 00Z and 12Z runs are available where each dataset includes analysis data and forecast data from a single Eta run. Each dataset contains forecasts for every 6 hours going out two and a half days (60hrs) from the run time. • </dc:description> • …
THREDDS Catalogs • Hierarchal structure of datasets • Dataset access methods • Structure on which to hang (reference) metadata THREDDS catalogs are for communicating information about a collection of datasets 1 0..* 0..* 0..* 0..*
THREDDS DQC(Dataset Query Capabilities) • THREDDS DQC documents describe how a subset of a data collection can be requested. • Large and time varying data collections are cumbersome to view as a hierarchical structure • THREDDS DQC documents describes the set of requests that can be made to one or more DQC services and the form of those requests. • THREDDS DQC documents are an abstract representation of a collection of datasets
THREDDS DQC <?xml version="1.0" encoding="UTF-8"?> <queryCapability name="Unidata IDD NEXRAD Level 3 Radar Data" version="0.2"> <query base="http://motherlode.ucar.edu/cgi-bin/thredds/RadarServer.pl" construct="append" returns="catalog"/> <selectStation id="station" title="Stations:" multiple="true" required="true"> <station name="ANCHORAGE/Bethel AK" value="ABC"> <location latitude="60.78" longitude="-161.87"/> </station> … </selectStation> <selectList id="product" title="Products:" multiple="true" required="true"> <choice name=".5 reflectivity .54nm res" value="N0R" description=".5 reflectivity .54nm res 16 levels id 19/r"/> … </selectList> <selectList id="time" title="Times:" required="true"> <choice name="Latest" value="latest"/> … </selectList> </queryCapability>
THREDDS Services • THREDDS catalogs are sources of information about a collection of data on top of which complex services can be built. For instance, tools that: • Provide interoperability with GIS systems • Supply external discovery systems with needed information (e.g., Dublin Core, DIF, FGDC) • Supply information to improve data display and analysis, e.g., geolocation information
THREDDS and Discovery Systems • To supply external discovery services with the information they require, we need: • The proper information added to a catalog, e.g., title and description of a dataset, spatial and temporal ranges, parameters, dataset ID. • Service to provide metadata in desired encoding • Service to feed information to discovery system • Use discovery systems to search for data
THREDDS and Discovery Systems Communicate with Discovery Systems THREDDS Services with data server Discovery System (e.g., DLESE) Dublin Core Generator Metadata Harvester Searches Reads Catalog Writes Metadata Repository References Data server
THREDDS Status • Working on new versions of the catalog and DQC schemas • Working on updating existing tools to use new schemas • Working with UCAR DMWG and NCAR CDP on enhancing descriptive metadata • Working with OPeNDAP developers on integrating THREDDS and OPeNDAP
OPeNDAP and THREDDS • Enhance OPeNDAP C++ implementation to serve THREDDS catalogs • THREDDS DQC replace OPeNDAP File Servers
OPeNDAP and THREDDSMore Information • OPeNDAP Web page: http://www.unidata.ucar.edu/packages/dods/ • OPeNDAP Email list: dods@unidata.ucar.edu, subscribe at http://www.unidata.ucar.edu/packages/dods/home/mailLists/ • THREDDS Email list: thredds@unidata.ucar.edu, subscribe at http://www.unidata.ucar.edu/projects/THREDDS/maillists/ • THREDDS Web page: http://www.unidata.ucar.edu/projects/THREDDS/ • Support questions: support@unidata.ucar.edu