1 / 4

Climate-SDM (1)

A detailed use case of automating and tracking climate analysis processes with diverse scripts using Ferret, CDAT, Matlab, etc. The workflow involves running multiple steps on various files with the goal of reproducibility and record-keeping in a database. The tools Kepler and Vistrails are utilized for workflow composition and provenance management.

dgunter
Download Presentation

Climate-SDM (1)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Climate-SDM (1) • Climate analysis use case • Described by: Marcia Branstetter • Use case description • Data obtained from ESG • Using a sequence steps in analysis, each running scripts in Ferret, CDAT, matlab, …, etc. • Need to run the same sequence of steps over many files • sometimes changing the scripts • Sometimes adding/removing a step • Problem: Need workflow to run and track analysis process • Need to collect provenance • Provenance should be rich enough to have another person run the same analysis • Analysis scripts can be using various codes, such as Ferret, CDAT, matlab • Need to keep audit trail, and interaction with external tools • Task: workflow of steps of software versions, scripts, input files, etc. • Goal: repeatedly running workflows to be constructed. Each workflow run will write into a database a record of it, so anyone can reproduce the results or add to that, not necessarily on the same machine. • Tools to be used • Kepler – for composing workflow, and writing provenance to database • Vistrails – for keeping track of evolution of workflows and associated provenance data

  2. Climate-SDM (2) • Scaling analysis process • Described by Marcia Branstetter • Use case description • Need to analyze 6-hourly data over 100 year for atmosphere component • In T85 grid resolution – total volume is in 10-100 TB range, • Data resides on HPSS, order of 12,00 files, a few GBs each • Few TBs for limited number of variables needed in the analysis • Problem: extracting one or a few of the variables from HPSS • Can this process be automated? • Task (longer term): automate process using workflow tools • Problem: Parallelize analysis of large data • Task: use parallel statistics tools • Goal: use Parallel R for such jobs • Task already in progress

  3. Climate-SDM (3) • Earth System Grid • Described by: Dean Williams and Don Middleton • Use case description • 2 modes of getting data to users • Sets of files (using DataMover-Lite (DML)) • Using tools that perform aggregation on server side (OpenDap, CDAT, GRADS, LAS) • Currently only simple statistics needed on server side • Aggregation – hiding file structures on gateway searches is essential • Future needs as data scales • composite product across multiple data nodes • aggregation over multiple data nodes • Compare model runs from different sites • Tracking of precise provenance of how data was generated is needed • Task: using PnetCDF • CCSM4 on top of PnetCDF (already taking place) • netCDF4 has a new extended features – may require similar feature supported in PnetCDF • PnetCDF for post-processing (users still to be identified) • Other I/O bound groups?

  4. Climate-SDM (4) • Earth System Grid (cont’d) • Described by: Dean Williams and Don Middleton • Tasks: improve DML + SRM • Improve DML interface • Use of GridFTP-ssh in DML to speed transfers to client • Explore use of GridFTP-ssh for SRMs • Potential task: Value-based searches • Very Large communities performing impact studies • New community yet to be introduced to ESG • E.g. No of days of temp > 120 F in some region • Currently they use GIS tools on highly summarized data • Potential for need to perform value-based searched at server side as data scales • Potential task: compare simulated to observed data • Currently, ARM data is being converted to be CF (Climate and Forecast) convention compliant in order to be added to ESG holdings • Need to move data to a single site for comparison will require large scale automated data movement

More Related