150 likes | 255 Views
GADS: A Web Service for accessing large environmental data sets. Jon Blower, Keith Haines, Adit Santokhee Reading e-Science Centre University of Reading. http://www.resc.rdg.ac.uk resc@rdg.ac.uk. Background. At Reading we hold copies of various datasets (~2TB)
E N D
GADS: A Web Service for accessing large environmental data sets Jon Blower, Keith Haines, Adit Santokhee Reading e-Science Centre University of Reading http://www.resc.rdg.ac.uk resc@rdg.ac.uk
Background • At Reading we hold copies of various datasets (~2TB) • Mainly from models of oceans and atmosphere • Also some observational data (e.g. satellite data) • From Met Office, SOC, ECMWF, more • We serve these datasets to many end users • Scientists (1000s of hits per year) • Industry (e.g. British Maritime Technology) • Datasets are in a variety of formats • netCDF, GRIB, HDF, HDF5 … • Data do not conform to naming conventions • E.g. “temp” instead of “sea_water_potential_temperature”
Background (2) • There is a clear need to make access to these datasets easier • Users shouldn’t have to know details of how data are stored • Hence development of GADS (Grid Access Data Service) • Developed as part of GODIVA project • Grid for Ocean Diagnostics, Interactive Visualisation and Analysis • NERC e-Science pilot project • Originally developed by Woolf et al (2003) • Allows richer queries and more flexibility than DODS standard • Although we plan to implement a DODS translation layer
GODIVA Web Portal • Allows users to interactively select data for download using a GUI • Users can create movies on the fly • cf. Live Access Server
Advantages of GADS • User’s don’t need to know anything about storage details • Can expose data with conventional names without changing data files • Users can choose their preferred data format, irrespective of how data are stored • Behaves as aggregation server • Delivers single file, even if original data spanned several files • Deployed as a Web Service • Can be called from any platform/language • Can be called programmatically (easily incorporated into larger systems), workflows • Java / Apache Axis / Tomcat
GADS Web Service dataQuery dataRequest MetadataInterface Client Metadata Manager Utility META- DATA DATA FILES Architecture
GADS Methods • dataQuery() is used for querying the data holdings • “What datasets are there?” • “What variables are there in the dataset X?” • dataRequest() is used for downloading data • User can choose the data format • Can easily download subsets of data • Uses start-stride-count semantics (familiar in community) • dataRequestNatural() • Same as dataRequest() but in natural units (degrees, metres …)
dataQuery – examples of use • dataQuery(dataset, variable, axis) – general form • dataQuery(“”, “”, “”) – gets all dataset names in the catalogue • dataQuery(“FOAM_NINTH”, “”, “”) – gets all the variable names in the FOAM_NINTH dataset • dataQuery(“FOAM_NINTH”, “temperature”, “”) – gets the details of the grid for the temperature variable • dataQuery(“FOAM_NINTH”, “temperature”, “z”) – gets all values that the z coordinate can take • dataQuery(“”, “temperature”, “”) – gets all datasets that contain the “temperature” variable
dataRequest – example of use • dataRequest(“FOAM_NINTH”, “temperature”, “CDF”, “t”, 0, 1, 20, “z”, 0, 1, -1, “y”, 100, 4, 400, “x”, 300, 4, 600) • dataRequestNatural(“FOAM_NINTH”, “temp”, “CDF”, “t”, “2004-06-01 00:00:00”, “2004-06-22 00:00:00”, “z”, “0”, “10”, “y”, “42”, “64”, “x”, “-26”, “9”) • Returns URL to extracted dataset
Metadata manager (in progress) e.g. Adding a dataset – can “harvest” metadata from netCDF file headers
Limitations • Assumes one timestep per file • Hence doesn’t handle timeseries well • Long queries can cause problems (synchronous) • Needs a queuing system • Rotated grids a problem (esp. for dataRequestNatural()) • Could have richer metadata queries
Application: Search and Rescue • Search And Rescue Information System (SARIS) • British Maritime Technology (BMT) • Used by Coastguard to locate people who have fallen overboard • Runs a model using wind and surface current data • Forecasts where person will be by the time rescue arrives • By incorporating GADS, SARIS can consume up-to-date Met Office forecasts on demand. • Should improve quality of prediction
Spatial Databases • Database systems now including capability for storing geospatial data • IBM Informix, Oracle 10g, PostgreSQL, mySQL … • ReSC is evaluating some of these • Informix with Grid DataBlade looks promising (www.barrodale.com) • We need capability to store raster data (i.e. gridded data) • Many only store vector data • Gotcha – some vendors use “raster” to mean “photograph”, not “model data” • We also need to store 3-D data • Some only have native understanding of 2-D data
Future plans • Interact more with GIS community • There are already some relevant initiatives out there (e.g. MarineGIS) • Use of databases may help (some are OGC compliant) • But have problem that GIS tends to talk in 2-D • Develop DODS (=OpenDAP) layer • Encourage others to install GADS • We don’t want to hold lots of data in Reading! • POL, Met Office, ECMWF all expressed interest • Software needs “hardening” first… • Find more applications!