140 likes | 300 Views
First implementation. RISICO on the GRID architecture. Mirko D'Andrea, Stefano Dal Pra. Outline of the presentation. Porting features; Jobs management; Implementation tests and results; Conclusions and further development. Porting features. Totally implemented in python.
E N D
First implementation RISICO on the GRID architecture Mirko D'Andrea, Stefano Dal Pra
Outline of the presentation • Porting features; • Jobs management; • Implementation tests and results; • Conclusions and further development.
Porting features • Totally implemented in python. • Uses the same executable of the RISICO system (no changes needed). • Easily configurable through configuration file.
The RISICO system • Italy: 310000 km^2 • Current system: 300k regular cells, 1km side. • Grid version: 30M regular cells, 0.1km side. GRIDIFICATION
Get Input from Database Upload Input into catalog Create n jobs JOB 1 JOB n Get input from catalog Get input from catalog Run RISICO on dataset 1 Run RISICO on dataset n Write output 1 to catalog Write output n to catalog Collect outputs from catalog Write Outputs to Database RISICO vs GRID-RISICO Get Input from Database GRIDIFICATION Run RISICO Write Output to Database
Job submission • A RISICO's job is fully defined by a jdl (job description language) file and by a parameter file. • Each submitted job must terminate successfully within a defined time. The job activity is monitored by a software module called JobMonitor. • The job submission procedure is handled by a JobSubmitter, which creates a set of job and associates a JobMonitor with each job.
Job Monitoring • All the jobs are monitored by an instance of a module called JobMonitor. • The JobMonitor: • Checks the job status during execution; • Retrieves the job output from catalog; • If the job fails, JobMonitor tries to resubmit it. • JobMonitor will log the error if the job fails to run correctly.
Workflow: job creation, submission and data-collection • Downloads input from remote meteo-data database, creates an archive and uploads it to catalog; • Creates a jdl and parameters file for each job; • Submits the jobs. • Waits for jobs output. • Gets jobs output from catalog and aggregates them.
job 1 job n Job definition (1) • Each job works with a specific dataset defining a spatial domain (subset). • Such subsets are created off-line and stored on the catalog. • A parameters file states the association between a job and a dataset. • Each job produces an output, whose path in the catalog is a-priori known.
Job definition (2) • Job 1: • Domain: celle/celle_01.tar.bz2 • Status: celle/stato0_01.tar.bz2 • Input: input/input_20070119.tar.bz2 • Output: output/output_01_20071119.tar.bz2 • Each job has its own domain. • Job domain, status information and output are referred to the same geographical domain • All jobs share the same input file.
Job 1: • Domain: celle/celle_01.tar.bz2 • Status: celle/stato0_01.tar.bz2 • Input: input/input_20070119.tar.bz2 • Output: output/output_01_20071119.tar.bz2 Job definition (3) CATALOG • Job 2: • Domain: celle/celle_02.tar.bz2 • Status: celle/stato0_02.tar.bz2 • Input: input/input_20070119.tar.bz2 • Output: output/output_02_20071119.tar.bz2 • Job n: • Domain: celle/celle_nn.tar.bz2 • Status: celle/stato0_nn.tar.bz2 • Input: input/input_20070119.tar.bz2 • Output: output/output_nn_20071119.tar.bz2
Final version • Estimated performances on the complete set of data (30M cells): • Total CPU-Time: about 2 hours and 30 minutes; • Optimal job number: about 30 (5-10 minutes of CPU time for each job); • Storage: 30GByte / day.
Test Results • The porting has been tested with a subset (1M cells) of the RISICO system final working-set . • 10 parallel jobs were used. • Performances: • Job CPU-time: 30 seconds • Grid overhead: 2 minutes.
Conclusions • RISICO represents a feasible and significative test case. • Grid architecture provides a valuable benefits to operational activities.