230 likes | 354 Views
Joe Sirott PMEL/NOAA. Adventures in Web Services for Large Geophysical Datasets. Motivation. Zonal averages of precipitation trends From Zhang, et al Nature 448, 461-465(26 July 2007) . Seasonal zonal averages of Arctic temperature trends
E N D
Joe Sirott PMEL/NOAA Adventures in Web Services for Large Geophysical Datasets
Motivation Zonal averages of precipitation trends From Zhang, et al Nature 448, 461-465(26 July 2007)
Seasonal zonal averages of Arctic temperature trends From Graversen, et al Nature 541, 53-56(3 Jan 2008)
Use case Calculate zonally averaged seasonal temperature trends from 20th century climate experiment from four climate models (NASA GISS, NCAR PCM and CCSM, GFDL CM2.1, and Hadley CM3) in CMIP3 archives from 30N to 90N Total of 81 files in 36GB Time period of interest 1979-2000
Recipe is… Regrid all model data to common grid Calculate seasonal ensemble means for all models for 30N-90N, 1979 - 2000 Calculate zonal means from seasonal ensemble means Calculate seasonal trends from zonal mean Plot/download results
Traditional approach Find datasets/variables of interest Download individual data files or subset with OPeNDAP Analyze data locally
Problems with traditional approach Awkward user interface(s) Obscure UI naming conventions makes it difficult to find variables of interest Datasets often aren’t aggregated Subsetting and/or aggregation services often fail with large datasets (e.g. out of memory errors) Requires download of 36GB of data (file download) or ~2.5GB (OPeNDAP) for final product ~5KB.
More modern approach • Aggregated data • Spatial or temporal subsetting • Meaningful variable and dataset names • Modern Web UI
Dapper (dapper.pmel.noaa.gov/dapper) • Web server that provides distributed access to in-situ or gridded data via OPeNDAP protocol • Aggregates local files, or remote datasets via HTTP or OPeNDAP • Streams data (no more “out of memory” errors)
DChart (dapper.pmel.noaa.gov) • Browser based tool for visualizing or downloading in-situ or gridded ocean or atmospheric data • Also aggregates data • AJAX based user interface • Access to ~3.5 TB of gridded data • Configurable UI
What’s missing? • Still requires download of ~2.5GB for final product ~5KB • Lots of clicking to download multiple datasets • BIG problem for AR5 data needs (>1PB)
Ideal analysis environment (scientist perspective) • Highly interactive (i.e. command line) • Scripting in familiar language of choice (bash, Python, Ruby, Matlab) • Access to multiple tools (Matlab, nco, cdo, GrADS, Ferret, gdal, … ) • Access to custom home-grown tools • Storage of intermediate products (anomalies, statistics, etc.)
Limitations of Web services • Users locked-in to backend analysis software • Difficult to debug • Steep learning curve • How to handle long lived operations? • Security problems • No (or limited) scripting capabilities • Not interactive
A cloud computing alternative • Upload data to cloud • Move computation to data • Boot VM preloaded with common analysis tools • Users can customize (and share) VM images and data • Users have full ssh access to Xen VM(s) running Linux with local access to data stored in cloud
Amazon AWS • Amazon EC2 • Uses customizable Linux XEN image • Start 1-100 hosts in parallel • $0.10/instance-hour • Amazon S3 • Data storage service • $0.15 GB/month for storage • Data transfer in $0.10/GB • Data transfer out $0.18/GB
Sample workflow (free version) • User authenticated via Web UI • EC2 instance booted with OPeNDAP access to datasets (stored on S3 or EC2 volumes) • User rpms installed (optional) • ssh access to instance using ssh keypair (generated when account issued) • User analyzes, downloads, visualizes, ... • Instance restored to pool after user done (or after period of inactivity)
Analysis cloud advantages • Scalable • Data lives in same network as software • No user software lock-in • Users can work in familiar environment • Security problems reduced • Interactive • Access to debugging tools BUT • Lots of details to work out!
More info PMEL Dapper Server http://dapper.pmel.noaa.gov/dapper PMEL DChart http://dapper.pmel.noaa.gov/dchart Downloads, propaganda http://www.epic.noaa.gov/epic/software/dapper/ http://www.epic.noaa.gov/epic/software/dchart/ Joe.Sirott@noaa.gov