290 likes | 453 Views
Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications. Rohan Kurian, Pavan Balaji, P. Sadayappan The Ohio State University. Parameter Sweep Applications. An important class of applications Set of independent tasks MCell Application
E N D
Opportune Job Shredding:An Efficient Approach for Scheduling Parameter Sweep Applications Rohan Kurian, Pavan Balaji, P. Sadayappan The Ohio State University
Parameter Sweep Applications • An important class of applications • Set of independent tasks • MCell Application • 3D simulations for sub-cellular architecture/physiology • GTOMO (Parallel Tomography) Application • Multiple view-point simulation • Systems exist for scheduling on the Grid • Cluster-based Scheduling?
Application Level Schedulers • Manage the scheduling of applications • Break the application to appropriate chunks • APST (AppLeS Parameter Sweep Template) • NIMROD • Greedy approach to schedule PSA chunks
Presentation Roadmap • Job Scheduling in Clusters • Multi-Site Job Scheduling • PSA Scheduling Strategies • Multi-Site Scheduling of PSAs • Performance Evaluation • Conclusions
Job Scheduling in Clusters • Mapping arriving jobs to available resources • Multiple Schemes for Scheduling • First Come First Serve (FCFS) • Conservative Scheduling • Aggressive or EASY Scheduling • Fair-Share Constraints • A user can not have more than ‘N’ queued jobs • Submitting the multiple chunks of a PSA job • Violation of Fair-Share constraints • Combine chunks to form a single parallel job
Formation of PSAs in Clusters Small Independent Tasks Parallel Parameter Sweep Application
Presentation Roadmap • Job Scheduling in Clusters • Multi-Site Job Scheduling • PSA Scheduling Strategies • Multi-Site Scheduling of PSAs • Performance Evaluation • Conclusions
Multi-Site Job Scheduling • Multiple Simultaneous Requests • Job submitted to multiple sites • Started on the earliest cluster • Existing schemes have limitations • Heterogeneous Clusters • Different Scheduling Schemes
Jobs Jobs Jobs Meta Scheduler Meta Scheduler Meta Scheduler Local Scheduler Local Scheduler Local Scheduler Multiple-simultaneous-requests Site 1 Site 2 Site 3
Presentation Roadmap • Job Scheduling in Clusters • Multi-Site Job Scheduling • PSA Scheduling Strategies • Multi-Site Scheduling of PSAs • Performance Evaluation • Conclusions
PSA Scheduling Strategies • Flooding based Job Shredding • Submit all chunks in the PSA at once • Greedy approach • Improves User and System metrics • Doesn’t ensure fairness to Non-PSA jobs • Opportune Job Shredding • Uses an additional Application-Level Scheduler • Monitors the current schedule of the system • If no normal backfill is possible • Allow PSA jobs to shred and backfill
Presentation Roadmap • Job Scheduling in Clusters • Multi-Site Job Scheduling • PSA Scheduling Strategies • Multi-Site Scheduling of PSAs • Performance Evaluation • Conclusions
Multi-Site Scheduling for PSAs • Two-level Application Level Schedulers • No constraints on sites • Allowed to have different speeds • Allowed to have different scheduling policies • Similar to “Multiple Simultaneous Requests” • Simultaneous requests only for PSAs
Multi-Site Scheduling for PSAs Meta Application-Level Scheduler Site 1 App-Level Scheduler App-Level Scheduler Site 2 Job Queue Local Scheduler Job Queue Local Scheduler App-Level Scheduler Job Queue Local Scheduler Site 3
Presentation Roadmap • Job Scheduling in Clusters • Multi-Site Job Scheduling • PSA Scheduling Strategies • Multi-Site Scheduling of PSAs • Performance Evaluation • Conclusions
Performance Metrics • Response Time • Completion Time – Submit Time • Slowdown • Response Time / Runtime • Loss of Capacity (LOC) • LOC = min {(waiting jobs procs), idle procs} • T = Time for which this state lasts • LOC = LOC x T
Evaluation Scheme • Simulation based Approach • CTC trace from Feitelson’s archive • EASY backfilling used • For multi-site evaluation • CTC traces from 3 different months • Processing speeds in the ratio 2:1:3
Flooding Based Job Shredding • Up to 60% improvement for PSA Jobs • Up to 90% worse performance for Non-PSA Jobs
Flooding: Job Category wise breakup • Narrow Short Non-PSA jobs suffer most • Loss of back-filling opportunities is the main reason
Flooding: Loss of Capacity • Up to 75% improvement in the Loss of Capacity
Opportune Job Shredding • Up to 70% improvement for PSA Jobs • Less than 2% worsening in performance for Non-PSA Jobs
Opportune: Job Category wise breakup • No category of Non-PSA jobs suffers more than 7%
Opportune: Loss of Capacity • Up to 12% improvement in the Loss of Capacity
Opportune (Multi-Site) • Up to 95% improvement for PSA Jobs • No significant loss of performance for Non-PSA jobs
Opportune (Multi-Site):Response Time • Up to 75% improvement for PSA Jobs • No significant loss of performance for Non-PSA jobs
Opportune (Multi-Site):Slowdown • Up to 95% improvement for PSA Jobs • No significant loss of performance for Non-PSA jobs
Opportune (Multi-Site):Loss of Capacity • Up to 45% improvement in the Loss of Capacity
Concluding Remarks • Opportune Job Shredding • Efficient Scheduling of PSAs • Single Site and Multi-Site versions • Significant improvement for PSA jobs • Ensures that Non-PSA jobs are not affected • Plan to integrate this with Prod. Schedulers