200 likes | 418 Views
Optimizing workflow execution on the Grid. Gaurang Mehta - gmehta@isi.edu Based on “Optimizing Grid-Based Workflow Execution” Gurmeet Singh, Carl Kesselman, Ewa Deelman Submitted to HPDC-05. Introduction. Use of workflows on grid is becoming widespread in scientific applications.
E N D
Optimizing workflow execution on the Grid Gaurang Mehta - gmehta@isi.edu Based on “Optimizing Grid-Based Workflow Execution” Gurmeet Singh, Carl Kesselman, Ewa Deelman Submitted to HPDC-05 Optimizing Workflows on the Grid
Introduction • Use of workflows on grid is becoming widespread in scientific applications. • Astrophysics • High Energy Physics • Biology etc. • Current focus is on • GUIs for composing workflows • Standardizing workflow specification languages • Mapping of tasks in the workflow for optimizing system metric • Use of some workflow execution engine to execute the workflow (DAGMan, GRMS, Triana, Webflow etc) • Performance of the workflow execution engine has not received much attention Optimizing Workflows on the Grid
Workflow Model Parse the workflow description Create a ready list of executable tasks Monitor task completion Select tasks from the ready list Dependency analysis Identify resources for the tasks Update the ready list Start the tasks on the resources Optimizing Workflows on the Grid
Workflow Model • The costs of workflow execution are in • Creating and maintaining a ready list • Resource matching • Dispatching jobs to resources • These costs can become significant for a fine granularity workflow (the runtimes of jobs are small) due to • Large number of jobs in workflow • Dependencies between jobs • Distributed nature of resources Optimizing Workflows on the Grid
Condor as the Workflow Execution Engine • We use Condor as the Workflow Execution Engine. • Condor-Glidein is used for provisioning the execution resources ahead of time. • Resource provisioning allows for experiments to isolate and examine the workflow execution overheads • Based on the workflow execution costs described earlier, the factors that affect the performance in the context of the Condor system are the following • Scheduling interval (schedd, negotiator) • Job Dispatch Rate (schedd) • Job Submission rate (DAGMan, schedd) Optimizing Workflows on the Grid
Montage Workflow Structure 4500 total jobs 890 jobs top level 2600 jobs second level 10 minutes 100 processors 100% efficiency Optimizing Workflows on the Grid
Execution Environment Condor Pool COLLECTOR NEGOTIATOR Central Manager 100 Worker Nodes from NCSA Teragrid cluster DAGMan SCHEDD STARTD Submit Host Optimizing Workflows on the Grid
Baseline Condor Performance Optimizing Workflows on the Grid
Scheduling Interval • Negotiation cycle is the process of identifying resources for jobs. • Interval between two successive negotiation cycles is the scheduling interval • Can be controlled in variety of ways • Fixed Scheduling Interval • Starting negotiation cycle at submission of each job at a rate no greater than 20 seconds Optimizing Workflows on the Grid
Scheduling at Job Submission 30 seconds 5 minutes 10 minutes Optimizing Workflows on the Grid
Fixed Scheduling Interval 30 seconds 5 minutes 10 minutes Optimizing Workflows on the Grid
Effect of Scheduling interval Optimizing Workflows on the Grid
Job Dispatch Rate • Dispatch rate is the rate at which the scheduler can start the jobs on the remote resource • Throttled using the JOB_START_DELAY • Default setting of 2 seconds prevents loads on the submit machine and on the scheduler • Artificial delay can be expensive if workflow contains too many small jobs. Optimizing Workflows on the Grid
Job Dispatch Rate JSD 0 seconds 1 second 2 second Optimizing Workflows on the Grid
Job submission rate • Rate at which DAGMan submits jobs to the Condor queue. • With a faster dispatch rate, the job submission rate becomes the limiting factor. • Submission rate depends on the dependencies in a workflow. • Restructuring a workflow to reduce dependencies can increase submission rate. Optimizing Workflows on the Grid
Workflow Restructuring Optimizing Workflows on the Grid
DAGMan for each composite job 1 Cluster per level 2 Clusters per level Optimizing Workflows on the Grid
Condor cluster for each composite job Optimizing Workflows on the Grid
Conclusion • Condor is a high throughput system and the default configuration works well for long running jobs. • We are interested in high performance using Condor for fine granularity workflows. • It is possible to improve the performance by modifying the configuration parameters and using Condor features like clustering. • 90% reduction in the workflow completion time for the Montage fine granularity workflow. • The reduction possible depends on the workflow structure, granularity and number of available resources Optimizing Workflows on the Grid
Future Work • Investigate the tradeoff between the resource requirements and the workflow completion time. • Investigate the effect of granularity on the workflow performance. • Read “Optimizing Grid-Based Workflow Execution” by Gurmeet Singh, Carl Kesselman, Ewa DeelmanSubmitted to HPDC-05 at http://pegasus.isi.edu/publications.htm Optimizing Workflows on the Grid