1 / 21

WS-PGRADE: Supporting parameter sweep applications in workflows

WS-PGRADE: Supporting parameter sweep applications in workflows. Péter Kacsuk, Kriszti á n Karóczkai, Gábor Hermann, Gergely Sipos, and József Kovács MTA SZTAKI. Content. Motivations Lessons learnt from P - GRADE portal Lessons learnt from CancerGrid Workflow concept of gUSE/WS-PGRADE

petula
Download Presentation

WS-PGRADE: Supporting parameter sweep applications in workflows

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WS-PGRADE: Supporting parameter sweep applications in workflows Péter Kacsuk, Krisztián Karóczkai,Gábor Hermann, Gergely Sipos, and József Kovács MTA SZTAKI

  2. Content • Motivations • Lessons learnt from P-GRADE portal • Lessons learnt from CancerGrid • Workflow concept of gUSE/WS-PGRADE • Parameter sweep support of gUSE • CancerGrid • Executing PS nodes of gUSE workflows in desktop grids • Conclusions

  3. Popularity of P-GRADE portal • It has been used in many EGEE and EGEE-related VOs: • GILDA, VOCE, SEE-GRID, BalticGrid, BioInfoGrid, EGRID, etc. • It has been used in many national grids: • UK NGS, Grid-Ireland, Turkish Grid, Croatian Grid, Grid Malaysia etc. • It has been used as the GIN VO Resource Testing Portal • It became OSS in the beginning of Januar 2008: https://sourceforge.net/projects/pgportal/

  4. Download of OSS P-GRADE portal 828 downloads so far

  5. Lessons learnt from P-GRADE portal • Popular because it provides • Easy-to-use but powerful workflow system (graphical editor, wf manager, etc.) • Easy-to-use parameter sweep concept support • Easy-to-use MPI program execution support • Grid virtualization: • Multi-grid/multi-VO access mechanism for LCG-2, gLite, GT2 and GT4

  6. Parallel execution inside a workflow node • Parallel execution among workflow nodes Multiple jobs run parallel Each job can be a parallel program Introducing three levels of parallelism Multiple instances of the same workflow with different data files • Parameter study execution of the workflow

  7. This could be any workflow GEN Grid job generates input parameter space SEQ SEQ SEQ SEQ Parameter sweep grid jobs COLL Collector grid job evaluates the results of the simulation Parameter study workflow

  8. 3-phase PS execution in P-GRADE portal First phase: executing ones all the Generators Second phase: executing all generated eWorkflows in parallel Last phase: executing ones all theCollectors

  9. CancerGrid workflow needs more • Usage of generators and collectors at any node of the WF without any ordering restrictions • Usage the PS execution at node-level at any node of the WF without any ordering restrictions

  10. CancerGrid workflow needs more N = 30K, M = 100 => about 0.5 year execution time x1 NxM= 3 million x1 xN xN xN NxM NxM x1 xN xN N=30K xN Generator job Generator job NxM= 3 million

  11. Solution of the problem • We need an environment where the user can develop and execute such a workflow • The environment should contain a broker that decides where to execute the nodes • MPI nodes on SG clusters • Nodes with very short execution time on local resources • Seq. nodes with small number of invocations at SGs • Seq. nodes called many times at DGs • Such an environment for SGs is: • gUSE: provides a high-level service set based middleware • WS-PGRADE: provides a workflow user interface

  12. gUSE and WS-PGRADE • gUSE (grid User Support Environment) • is a grid virtualization environment • exposes the grid as a workflow • enables the execution of workflows simultaneously in many grids no matter what their middleware is • WS-PGRADE is the user interface to support • Editing, configuring, publishing workflows (as grid applications)

  13. PS workflow concept of WS-PGRADE • Any node of the workflow can be: • PS job • Generator • Collector • There are two kinds of relationship between input files of PS nodes: • Cross product • Dot product

  14. Workflow Graph Overview in WS-PGRADE Input Port Node: job, service call (WS, legacy), wf Output Port The Workflow Editor as it appears for the user

  15. *K Legend: Cross Product Dot Product Configuring the Workflow Specify the number of input files on external input Ports m n h Generator job produces multiple data on the output port within one job submission step SpecifyDot or Cross product relation of Input ports to define the number of job submissions 1 Specifyjob to be Collector by defining a Gathering Input Port. The Job execution will be postponed until all input files have arrived to that port

  16. *K h m n m*n*h*K S S h m*n S Animation the number of generated output files Generator job runs h times and each run generates K files on the output port m*n h*K In case of dot product the job is submitted with input files having a common index number in each input port m*n h*K m*n h*K m*n*h*K S S=max(m*n,h*k) 1 1 S In case of cross product separate job submission is generated for each possible input file combination S S S S

  17. The user concern • I have a large workflow containing: • Sequential nodes to be executed once • Sequential nodes to be executed many times (PS) • MPI nodes to be executed once • MPI nodes to be executed many times (PS) • I want to execute this workflow as fast as possible using as many resources as possible

  18. Execution in the private DG of CancerGrid project Execution as Web Service Execution in a local resource Execution in EDGeS VO of EGEE NxM= 3 million x1 x1 xN xN xN NxM x1 xN xN N=30K xN NxM Generator job Generator job NxM= 3 million

  19. Appl. Repository WS-PGRADE gUSE Service Grid EGEE Service Grid OSG GlobalDEG LocalDEG LocalDEG LocalDEG Putting everything together gUSE/WS-PGRADE provides the transparent access to SGs/DGs University DG Volunteer DG LocalDEG

  20. Family of P-GRADE products and their use • P-GRADE • Parallelizing applications for clusters and grids • P-GRADE portal • Creating simple workflow and parameter sweep applications for grids • P-GRADE/GEMLCA portal • Creating workflow applications using legacy codes and community codes from repository • gUSE/WS-PGRADE • Creating complex workflow and parameter sweep applications to run on clusters, service grids and desktop grids • Creating workflow applications using embedded workflows, legacy codes and community workflows from workflow repository

  21. Conclusions • gUSE and WS-PGRADE solve all the limitation problems of P-GRADE portal: • Implementation of gUSE is highly scalable, can be distributed on a cluster or even on different grid sites. • Stress tests show that it can simultaneously serve thousands of jobs (currently manages ~100,000 jobs in CancerGrid) • Its workflow concept is much more expressive than in P-GRADE portal (recursive wf, generic PS support, etc.) • WS-PGRADE provides two user interfaces: • Developer (creates and exports WFs into the WF repository of gUSE) • End-user (imports and executes WFs from the WF repository) • gUSE provides grid virtualization at workflow level: nodes of a WF can be executed by • Web Services, local resources, service grids and desktop grids (see EDGeS project)

More Related