710 likes | 848 Views
DAP Clients and Services. Section 3 APAC ‘07 OPeNDAP Workshop 12 Oct 2007 James Gallagher. Outline. Browsing a Server - jump right in DAP Requests and Responses - background on using DAP Finding Data Types of Clients Graphical Command line Custom. Browsing a Server.
E N D
DAP Clients and Services Section 3 APAC ‘07 OPeNDAP Workshop 12 Oct 2007 James Gallagher
Outline • Browsing a Server - jump right in • DAP Requests and Responses - background on using DAP • Finding Data • Types of Clients • Graphical • Command line • Custom
Browsing a Server • Type the Server’s URL into the browser • Hyrax (and most other DAP servers) provide a way to browse data • Choose a data set using THREDDS catalogs and/or common directory traversal • Choose one or more variables within a data set using the HTML form interface
Open a server… Type the server’s URL; the URL could be an Entry in a catalog or HTML page. Contents at the top-level These links become active when a dataset Is listed. For a directory, these don’t apply
Browse its directory structure Follow the Pathfinder links down to …
…and traverse all the way down to a file … this point. Now we see a listing of datasets Descend into a dataset
Open a file Note that the URL is duplicated here.
Supply a constraint; Get ASCII data Use the form elements to build a Constraint Note that the constraint is visible here, appended to the URL
The ASCII data view Note the constraint and the ‘.asc’ suffix appended before the constraint.
Spreadsheets can often read URLs and they Can parse the CSV output of Hyrax (and most Other DAP servers) Paste a DAP URL with the ‘.ascii’ extension into the Location box
Data read into the spreadsheet. Sometimes you have to tell the spreadsheet how to ‘import’ the data
Browsing summary • Directory hierarchy browsing • Data files open to a HTML form which enables choosing variables • The form supports interactive construction of constraint expressions and ASCII data returns • The form interface has many limitations but it can be used in many different situations
DAP background information • Data are referenced by a URL • DAP responses with metadata or data are requested using tokens appended to the URL • With a data granule, elements are accessed using a Constraint Expression
URLs Reference Data • As we’ve seen, URLs reference data granules (usually files). • DAP, version 2 defines three responses • DDS - syntactic metadata - information about the structure of the data • DAS - semantic metadata - background information about the data • DODS - data - actual data values, bundled with syntactic metadata to form a self-contained response.
DAP Data Model • A Dataset is a collection of variables (tuples of type-name-value) • Each variable has attributes which are also type-name-value tuples • The Dataset may also have ‘global’ attributes
Data Model Types • Types of variables: • Scalars: Byte, Integer, Float, String, URL • Array: N-dimensional • Structure: Simple aggregate type • Sequence: hierarchical table data • Grid: Array with map vectors (establishes a mapping between array indeces and independent variable values)
Attributes • Scalars • Vectors • Structures • No Grids or Sequences.
Accessing those responses • For each of the responses, add the extension .dds, .das or .dods at the end of the URL ‘file name.’
Other response types • DAP4 will use XML to encode metadata and replace the two objects with a single response accessed using .ddx • Virtually all servers support: • Info (.info): A HTML page built using all the metadata • HTML (.html): The HTML for interface we’ve seen • ASCII (.asc, .ascii): The ASCII data dump, also already seen
Aggregation • There are several different servers which can perform aggregation • TDS: Array data • GDS, Hyrax/JGOFS: Sequences (table data) • BES (but not when used in Hyrax): Any collection of data types aggregated to a Structure • Aggregation maps searching and selecting from an Inventory onto using a constraint expression • Aggregation can eliminate the dichotomy between inventory searching/access and data access
http://satdat1.gso.uri.edu/thredds/dodsC/NWAtlanticDec_1km.htmlhttp://satdat1.gso.uri.edu/thredds/dodsC/NWAtlanticDec_1km.html An example Aggregation
THREDDS responses • Use THREDDS to define a logical hierarchy that’s distinct from the set of directories that actually hold the data. • We can request THREDDS catalog XML files using ‘catalog.xml’ or HTML pages using ‘catalog.html’ after a directory name. • While the directory browser works for any directory, THREDDS catalogs are valid only for the logical hierarchy they define • Files/Directories not included in that hierarchy have no catalogs
THREDDS examples • Switch Hyrax to the THREDDS HTML view: Choose the HTML view
The THREDDS HTML view • The top-level THREDDS catalog on our test server defines a single data root directory (SVN Test Data Archive) • This illustrates how THREDDS can be used to control the view of data presented by the server • Use ‘catalog.xml’ in place of ‘catalog.html’ to get the catalog data in an XML document.
THREDDS data set page • THREDDS catalogs can list more than one access mechanism - here we see on the DAP, but WCS, WMS, et c., are other possibilities
DAP Summary • DAP requests are made using a token appended to the filename part of URL • Responses defined by the DAP2 and (in progress) DAP4 are: DDS, DAS, DODS and DDX. These return metadata and data • Other responses are used to access ASCII data values, HTML metadata pages and data access interfaces • Constraint expressions are used to limit (subset, projection, selection) data returned
DAP Summary, cont. • THREDDS is • a distinct protocol • compliments DAP • as Hyrax implements it supports both HTML and XML views of the catalogs • Defines a logical hierarchy that is distinct from the way the data are actually stored
Finding Data • Ways to find data: • The OPeNDAP Data Set List • GCMD • TPAC • Google • THREDDS • We maintain a page with links to dataset searching sites: • http://www.opendap.org/data/index.html
Common Features • All of these data location features except Google depend on active community involvement in building catalogs of data • The solutions can be described as static documents or crawlers • Google and TPAC are crawlers • Crawlers can discover datasets without human intervention • They can make mistakes that seem silly • The The Dataset List, GCMD and THREDDS are static documents or collections of static documents • Static lists can be tailored by hand • They can go out of date quickly
Differentiating Features • Google & TPAC: • Google is just crawling HTML. If a server is not linked to a HTML page, it won’t be found. • TPAC is preset with server locations and picks up changes at those sites
Differentiating Features, cont. • The Static Lists: • The Dataset List has a very low metadata requirement • Not maintained as actively as either GCMD or THREDDS catalogs • GCMD: • The GCMD has a fairly high entry level threshold • Professional staff maintain the GCMD as their sole job • THREDDS • THREDDS catalogs are, or can be, located at the data - locality distributes maintenance • Quality varies from site to site
Finding Data Summary • Locating data seems like it would be the place to start building a system, but it’s far more varied than the one-size-fits-all approach most tried in the 1990’s • Crawlers and hierarchical lists show the most promise but maintained centralized lists are also useful
Accessing Data with DAP • Web Browser • Already discussed… • Graphical clients • ncBrowse, ODC, Ferret, GrADS • Command-line clients • getdap (UNIX, win32), loaddap (Matlab, IDL), nco (UNIX, win32) • Custom clients • C++, C, Java, Python • netCDF
Using a Graphical Client • Example: The OPeNDAP Data Connector • Combines data location with retrieval and display • Shows the built URL, including constraint expression • Can be transferred to another application
The ODC opens to the search pane Five different panes Choices within a pane
Use the dataset list to find the TPAC climatologies Choose the Antarctic Cooperative Research Centre TPAC/CISRO Climatologies …then hit ‘To Retrieve’ to move the selection to the next pane
The Retrieve pane Double click ‘levitus_annual_97.nc’ To see the contents of the file in The area on the right
The ODC shows the URL as it builds it. Click the checkbox for SALT and O2. For both, set the range of z_index to ‘0 to 0’. Make sure to hit tab/return in The boxes. …then hit ‘Output to’ to move to the View pane
There are a number of ways to view The data. Here the plotter has been Chosen (the default). Hit ‘Plot to’ to generate a plot using the Default settings.
When the plot is made, the interface Switches to the ‘Preview’ tab Switch back to the ‘Variables’ tab to Plot O2
Choose ‘O2’ from the menu, then hit ‘Plot to.’
Now that the data have been read and Cached, you can switch back and Forth between variables quickly without Any additional data transfers When ready, go back to the ‘Retrieve’ Pane.
Choose ‘TEMP’ Set the constraint …then plot
ODC Summary • The ODC provides a way to search for, access and plot data • Acts as a ‘URL builder;’ the URLs can be pasted into other applications • We didn’t need to know anything about DAP, its Request or Response objects or how a URL is used to request data • The data set list often contains stale entries • Also supports using the GCMD for data location - more on this when we cover searching