1 / 14

RISICO on the GRID architecture

First implementation. RISICO on the GRID architecture. Mirko D'Andrea, Stefano Dal Pra. Outline of the presentation. Porting features; Jobs management; Implementation tests and results; Conclusions and further development. Porting features. Totally implemented in python.

dore
Download Presentation

RISICO on the GRID architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. First implementation RISICO on the GRID architecture Mirko D'Andrea, Stefano Dal Pra

  2. Outline of the presentation • Porting features; • Jobs management; • Implementation tests and results; • Conclusions and further development.

  3. Porting features • Totally implemented in python. • Uses the same executable of the RISICO system (no changes needed). • Easily configurable through configuration file.

  4. The RISICO system • Italy: 310000 km^2 • Current system: 300k regular cells, 1km side. • Grid version: 30M regular cells, 0.1km side. GRIDIFICATION

  5. Get Input from Database Upload Input into catalog Create n jobs JOB 1 JOB n Get input from catalog Get input from catalog Run RISICO on dataset 1 Run RISICO on dataset n Write output 1 to catalog Write output n to catalog Collect outputs from catalog Write Outputs to Database RISICO vs GRID-RISICO Get Input from Database GRIDIFICATION Run RISICO Write Output to Database

  6. Job submission • A RISICO's job is fully defined by a jdl (job description language) file and by a parameter file. • Each submitted job must terminate successfully within a defined time. The job activity is monitored by a software module called JobMonitor. • The job submission procedure is handled by a JobSubmitter, which creates a set of job and associates a JobMonitor with each job.

  7. Job Monitoring • All the jobs are monitored by an instance of a module called JobMonitor. • The JobMonitor: • Checks the job status during execution; • Retrieves the job output from catalog; • If the job fails, JobMonitor tries to resubmit it. • JobMonitor will log the error if the job fails to run correctly.

  8. Workflow: job creation, submission and data-collection • Downloads input from remote meteo-data database, creates an archive and uploads it to catalog; • Creates a jdl and parameters file for each job; • Submits the jobs. • Waits for jobs output. • Gets jobs output from catalog and aggregates them.

  9. job 1 job n Job definition (1)‏ • Each job works with a specific dataset defining a spatial domain (subset). • Such subsets are created off-line and stored on the catalog. • A parameters file states the association between a job and a dataset. • Each job produces an output, whose path in the catalog is a-priori known.

  10. Job definition (2)‏ • Job 1: • Domain: celle/celle_01.tar.bz2 • Status: celle/stato0_01.tar.bz2 • Input: input/input_20070119.tar.bz2 • Output: output/output_01_20071119.tar.bz2 • Each job has its own domain. • Job domain, status information and output are referred to the same geographical domain • All jobs share the same input file.

  11. Job 1: • Domain: celle/celle_01.tar.bz2 • Status: celle/stato0_01.tar.bz2 • Input: input/input_20070119.tar.bz2 • Output: output/output_01_20071119.tar.bz2 Job definition (3)‏ CATALOG • Job 2: • Domain: celle/celle_02.tar.bz2 • Status: celle/stato0_02.tar.bz2 • Input: input/input_20070119.tar.bz2 • Output: output/output_02_20071119.tar.bz2 • Job n: • Domain: celle/celle_nn.tar.bz2 • Status: celle/stato0_nn.tar.bz2 • Input: input/input_20070119.tar.bz2 • Output: output/output_nn_20071119.tar.bz2

  12. Final version • Estimated performances on the complete set of data (30M cells): • Total CPU-Time: about 2 hours and 30 minutes; • Optimal job number: about 30 (5-10 minutes of CPU time for each job); • Storage: 30GByte / day.

  13. Test Results • The porting has been tested with a subset (1M cells) of the RISICO system final working-set . • 10 parallel jobs were used. • Performances: • Job CPU-time: 30 seconds • Grid overhead: 2 minutes.

  14. Conclusions • RISICO represents a feasible and significative test case. • Grid architecture provides a valuable benefits to operational activities.

More Related