170 likes | 373 Views
Aggregation and Subsetting in ERDDAP (a middleman data server). http://coastwatch.pfeg.noaa.gov/erddap Bob Simons <bob.simons@noaa.gov> NOAA NMFS SWFSC ERD. Aggregating Gridded Data.
E N D
Aggregation and Subsettingin ERDDAP (a middleman data server) http://coastwatch.pfeg.noaa.gov/erddap Bob Simons <bob.simons@noaa.gov> NOAA NMFS SWFSC ERD
Aggregating Gridded Data • Aggregating time points: 10,000's of data files: sst[latitude][longitude]become one virtual dataset:sst[time][latitude][longitude] • Aggregating variables:Many files with one variable per filebecome one virtual dataset with all variables
Subsetting Gridded Data • OPeNDAP Projection Constraintssst[57:57][121:2:141][163:2:183]ERDDAP: sst[(2012-08-12)][(20):2:(40)][(-140):2:(-120)] • Huge time-saver: User can just request what she needs (1%). • Aggregated datasets need to be subset-able.
Aggregating In-Situ and Tabular Data • A database-like table with rows and columnsE.g., One file has data for one buoy for one month. It isn't a multi-dimensional grid.There are no dimensions. • Aggregating features and time points: Features: stations, trajectories, profiles, ...Append into a giant virtual table.
SubsettingIn-Situ and Tabular Data • OPeNDAP Selection Constraints(no indices, because no multi-dimensional grids)longitude,latitude,time,sst&sst>35Easy to create. Uses domain units (degC).Very flexible. (Based on database's SQL SELECT.) • Huge time-saver User can just request what she needs (1%). • Aggregated datasets need to be subset-able.
Don't Treat In-Situ/Tabular Data Like Gridded Data • CF DSG stores in-situ data as as gridded .ncFine for storage, not for subsetting. • Problem: Indices aren't domain units. How do you request sst>35 with indices? • Problem: Indices aren't real-world sequence.Grid: lat[] is a sequence. lat[42:53] has meaning.Table: Buoy number isn't. &lat>20&lat<40 is buoy #2,14,26,109, not buoy[42:53] • Problem: 5 CF DSG data structures.
Option: Treat Gridded Data Like Tabular Data • Standard request: time, lat, lon bounding boxWhat about unusual requests of gridded data,e.g., SST>35 ("Select by value") • ERDDAP's EDDTableFromEDDGrid creates a giant virtual table from a gridded dataset.Columns: longitude, latitude, time, sstQuery: e.g., longitude,latitude,time,sst&sst>35Response: a table (one data point per row) • Risk: huge effort for server.
Summary: Huge Advantages of Aggregation and Subsetting • Users can find and deal with one aggregated dataset. • Users can make one subset request to one aggregated datasetGrids: indices to get a temporal and spatial subset.Tables (selection constraints): any subset you want.(Not: one subset request to each unaggregated file,or worse, using FTP to download lots of entire files.) • Don't treat tabular/in-situ data like gridded data.
Aggregation and Subsettingin ERDDAP (a middleman data server) http://coastwatch.pfeg.noaa.gov/erddap Bob Simons <bob.simons@noaa.gov> NOAA NMFS SWFSC ERD