120 likes | 198 Views
Atmospheric Data Analysis on the Grid. Kevin Hodges ESSC Co-workers: Brian Hoskins, Lennart Bengtsson Lizzie Froude. TRACK Software. Objectively identify and track weather systems such as extra-tropical and tropical cyclones.
E N D
Atmospheric Data Analysis on the Grid Kevin Hodges ESSC Co-workers: Brian Hoskins, Lennart Bengtsson Lizzie Froude
TRACK Software • Objectively identify and track weather systems such as extra-tropical and tropical cyclones. • Derive statistics for distribution and properties of weather systems. • Apply to data from climate models, reanalyses and numerical weather prediction. • Verify climate models, study properties of storms and the impact of climate change, explore how well storms are predicted in numerical weather prediction both deterministic and ensemble NWP.
Example’s NH, DJF, 2002/2003, 850 SH, JJA, 1999, 850
Statistics 5.0x10-5 NH, DJF, Trd.+Int., Int. c.i. 0.5 x10-5 SH, JJA, Trd.+Int., Int. c.i. 0.25 x10-5
Why is the Grid Useful? • Data sets are large and becoming larger as resolution increases. IPCC models ~250Km, latest models, e.g. ECHAM5, HIGEM, new ECMWF reanalysis ~60Km, ECMWF deterministic forecasts ~30Km. • Data can extend over multi-year periods, e.g. IPCC 1860-2100. • Climate models and NWP forecasts are also run in ensemble mode, ECHAM5 IPCC 3 member ensemble, ECMWF EPS 50 members (10-14 day) twice daily. • High temporal resolution required for tracking, 6hr or better. • Example: processing all the winters in a single 30 year period can take several days on a single machine depending on the machine and data resolution. • Faster analysis if data is processed in parallel. Data is organised in individual files, e.g. for each year, each season or each ensemble member. • For confidence intervals and significance tests on the statistics requires Monte-Carlo methods – resampling.
TRACK on the Grid • CONDOR to manage jobs on the campus grid. • Vanilla universe – no linking to condor libraries – no need for checkpointing. • Each job is a script which condor submits to each machine. Script copies data from ESSC (scp) and runs the code using the options supplied and copies the results back when finished. • Turnaround limited by the number of available machines and the time each job takes to run on a single machine, example, of 30 winters takes ~1-2hrs on 30 machines, ECMWF EPS (50 members, 14 days) ~30mins. • Main problems – individual machine resources, disk, memory, the higher resolutions require sufficient space in /tmp and >1GB memory – limits available machines. Machine reboots – need to resubmit jobs.
Statistics • Due to the nature of the tracks and the way the statistics are computed need to use resampling methods to determine confidence intervals (Bootstrap) and significance tests for differences (Permutation). Bootstrap – single sample, sample with replacement. Permutation – two samples, sample pooled data without replacement, new pairs of samples. • Samples ~2000, impossible on a single machine. • Using CONDOR, 2000 samples can be done in ~1day for a pool of ~100 machines. Depends on the sampling grid size. • Compute sampling distributions and p-values.
Significance Example Track Density Mean Intensity
Remote Data Access • Data often stored remotely. • Combine CONDOR with Opendap, data on an Opendap/DODS server can be accessed via URL directly by the application without having to download data to disk. • For data in netcdf format re-link application with curl enabled netcdf libraries. Only need to read data required, on the fly subsetting. • URL replaces filename, e.g. • http://<proxy username>:<proxy passwd>@www.antcrc.utas.edu.au/dods/nph-dods/dods-ncep2/pressure/hgt/hgt.2002.nc?time,level[0:1:0],lat,lon,hgt[100:1:105][0:1:0][ • 0:1:72][0:1:143] • Can aggregate data into a single data set, useful for reading across files.