1 / 10

Confessions of a Data Hoarder

Experiences of a Earth Science Data User. Confessions of a Data Hoarder. Rob Carver, The Weather Company. “Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.” . –Andrew S. Tanenbaum. Open Data and The Weather Company.

ike
Download Presentation

Confessions of a Data Hoarder

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Experiences of a Earth Science Data User Confessions of a Data Hoarder Rob Carver, The Weather Company

  2. “Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.” • –Andrew S. Tanenbaum

  3. Open Data and The Weather Company • Our business model is taking open data and using it to tell interesting stories that engage our users. • Over the years, we’ve archived over 100 Tb of data • GRIB1, GRIB2, NIDS, shapefiles, netCDF, HDF5, • NWS/NCEP, NCDC, FEMA, Census Bureau, NASA DAAC’s

  4. Locating Data • Google and literature searches • ??? • Data!

  5. 100+ Tb of Weather Models • Most data arrives through Unidata’s LDM and FTP pull scripts. ECMWF pushes data to our FTP site. (All GRIB2/1) • Ingested into the forecast system, and GRADS handles the model visualization • Archived to local disk arrays and Amazon S3

  6. Level-III NIDS Archive • NCDC maintains an archive of the WSR-88D radar network’s products from 1995 to present (>10 Tb) • Order datasets from a tape-based archive • Two years to acquire it using a set of PHP scripts • Easier to acquire the entire archive than figuring out what subset to acquire • Already had a NIDS parser for visualization

  7. FEMA Flood Maps • Data Acquisition Method: DVD for each state • Format: ESRI Shapefiles (1 shapefile of a feature class per state) • Data Display: Split state shapefiles by county and then pre-render tiles for moderate to coarse zoom levels on a map mashup.

  8. Suggestions • Data in a difficult/proprietary format just waste disk space • Please use data formats that are well-supported by open-source software packages (i.e. OGR/GDAL) • netCDF, TIFF, ESRI shapefiles, HDF5, geoJSON • Instead of complex CSV or fixed-width text files, use self-describing formats (JSON,XML,SQLITE)

  9. Suggestions (cont.) • Data/Navigation files should use the same naming conventions/sequences • Don’t use overly large archive files • Data pools/ftp servers attached to large disk arrays are awesome data providers (as long as limits are in place) • For really large, static datasets (>10Gb), Bittorrent would be really useful

  10. Questions/Comments/Answers? • rob.carver@weather.com

More Related