100 likes | 219 Views
Experiences of a Earth Science Data User. Confessions of a Data Hoarder. Rob Carver, The Weather Company. “Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.” . –Andrew S. Tanenbaum. Open Data and The Weather Company.
E N D
Experiences of a Earth Science Data User Confessions of a Data Hoarder Rob Carver, The Weather Company
“Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.” • –Andrew S. Tanenbaum
Open Data and The Weather Company • Our business model is taking open data and using it to tell interesting stories that engage our users. • Over the years, we’ve archived over 100 Tb of data • GRIB1, GRIB2, NIDS, shapefiles, netCDF, HDF5, • NWS/NCEP, NCDC, FEMA, Census Bureau, NASA DAAC’s
Locating Data • Google and literature searches • ??? • Data!
100+ Tb of Weather Models • Most data arrives through Unidata’s LDM and FTP pull scripts. ECMWF pushes data to our FTP site. (All GRIB2/1) • Ingested into the forecast system, and GRADS handles the model visualization • Archived to local disk arrays and Amazon S3
Level-III NIDS Archive • NCDC maintains an archive of the WSR-88D radar network’s products from 1995 to present (>10 Tb) • Order datasets from a tape-based archive • Two years to acquire it using a set of PHP scripts • Easier to acquire the entire archive than figuring out what subset to acquire • Already had a NIDS parser for visualization
FEMA Flood Maps • Data Acquisition Method: DVD for each state • Format: ESRI Shapefiles (1 shapefile of a feature class per state) • Data Display: Split state shapefiles by county and then pre-render tiles for moderate to coarse zoom levels on a map mashup.
Suggestions • Data in a difficult/proprietary format just waste disk space • Please use data formats that are well-supported by open-source software packages (i.e. OGR/GDAL) • netCDF, TIFF, ESRI shapefiles, HDF5, geoJSON • Instead of complex CSV or fixed-width text files, use self-describing formats (JSON,XML,SQLITE)
Suggestions (cont.) • Data/Navigation files should use the same naming conventions/sequences • Don’t use overly large archive files • Data pools/ftp servers attached to large disk arrays are awesome data providers (as long as limits are in place) • For really large, static datasets (>10Gb), Bittorrent would be really useful
Questions/Comments/Answers? • rob.carver@weather.com