1 / 33

Performance-responsive Scheduling for Grid Computing

High Performance Systems Group. Performance-responsive Scheduling for Grid Computing. Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK. High Performance Systems Group. Context. Funded by / collaborating with UK e-Science Core Programme IBM (Watson, Hursley)

Download Presentation

Performance-responsive Scheduling for Grid Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High Performance Systems Group Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK

  2. High Performance Systems Group Context • Funded by / collaborating with • UK e-Science Core Programme • IBM (Watson, Hursley) • NASA (Ames) • NEC Europe • Los Alamos National Laboratory • Integrate established performance tools into emerging grid middleware

  3. What do we mean by ‘scheduling’ • Users view • Jobs run somewhere on the Grid • Notion of deadline • Execution is single domain (includes pre-staging) • Resource providers view • Don’t mind which jobs are run where • As long as resources are well/evenly used • Maintaining customers deadlines is important • System view • Jobs can run anywhere • Resources are heterogeneous • Throughput is important, as are scheduling overheads

  4. High Performance Systems Group Managing through Middleware

  5. High Performance Systems Group Managing through Middleware • Determine what resources are required (predict) • Determine what resources are available (discover) • Map requirements to available resources (schedule) • Maintain contract of performance (QoS)

  6. High Performance Systems Group Performance Services • Intra-domain • Lab- / department-based • Shared resources under local administration • Multi-domain • Campus- / country-based • Wide-area resource and task management • Cross domain

  7. High Performance Systems Group Performance Services • Intra-domain • Lab- / department-based • Shared resources under local administration • Multi-domain • Campus- / country-based • Wide-area resource and task management • Cross domain

  8. High Performance Systems Group Performance Services • Intra-domain • Lab- / department-based • Shared resources under local administration • Multi-domain • Campus- / country-based • Wide-area resource and task management • Cross domain

  9. High Performance Systems Group Performance Prediction • Performance prediction tools • Aim to predict • Execution time • Communication usage • Data and resource requirements • Provides best guess as to how an application will execute on a given resource

  10. High Performance Systems Group User PACE Application Resource

  11. High Performance Systems Group User PACE Application Application Model Resource Model Resource

  12. High Performance Systems Group User PACE Application Model parameters Application Model Evaluation Engine Resource Model Resource config. Resource

  13. High Performance Systems Group User PACE Application Model parameters Application Model Evaluation Engine Resource Model Resource config. Resource

  14. High Performance Systems Group Why is prediction useful? • Scaling properties • Compare runtime options with • deadline • available resources • priority / other jobs • etc. Run-time Allows runtime scenarios to be explored before deployment

  15. High Performance Systems Group 1. Intra-Domain Co-Scheduling • Augment Condor scheduler with additional performance information • Scheduler driver, or co-scheduler (called Titan) • Use predictive data for system improvement • Time to complete tasks / utilisation of resources • QoS – ability to meet deadlines • Handle predictive and non-predictive tasks

  16. High Performance Systems Group Intra-Domain Co-Scheduling REQUESTS FROM USERS OR OTHER DOMAIN SCHEDULERS • Non-predictive tasks PORTAL PRE- EXECUTION ENGINE PACE SCHEDULE QUEUE MATCHMAKER GA CLUSTER CONNECTOR Titan CLASSADS CONDOR RESOURCES

  17. High Performance Systems Group Intra-Domain Co-Scheduling REQUESTS FROM USERS OR OTHER DOMAIN SCHEDULERS • Non-predictive tasks PORTAL PRE- EXECUTION ENGINE PACE SCHEDULE QUEUE MATCHMAKER GA CLUSTER CONNECTOR Titan CLASSADS CONDOR RESOURCES

  18. High Performance Systems Group Intra-Domain Co-Scheduling REQUESTS FROM USERS OR OTHER DOMAIN SCHEDULERS • Non-predictive tasks • Tasks with prediction data PORTAL PRE- EXECUTION ENGINE PACE SCHEDULE QUEUE MATCHMAKER GA CLUSTER CONNECTOR Titan CLASSADS CONDOR RESOURCES

  19. High Performance Systems Group Intra-Domain Co-Scheduling REQUESTS FROM USERS OR OTHER DOMAIN SCHEDULERS • Non-predictive tasks • Tasks with prediction data PORTAL PRE- EXECUTION ENGINE PACE SCHEDULE QUEUE MATCHMAKER GA CLUSTER CONNECTOR Titan CLASSADS CONDOR RESOURCES

  20. High Performance Systems Group Intra-Domain Co-Scheduling REQUESTS FROM USERS OR OTHER DOMAIN SCHEDULERS • Non-predictive tasks • Tasks with prediction data PORTAL PRE- EXECUTION ENGINE PACE SCHEDULE QUEUE MATCHMAKER GA CLUSTER CONNECTOR Titan CLASSADS CONDOR RESOURCES

  21. High Performance Systems Group Intra-Domain Co-Scheduling REQUESTS FROM USERS OR OTHER DOMAIN SCHEDULERS • Non-predictive tasks • Tasks with prediction data PORTAL PRE- EXECUTION ENGINE PACE SCHEDULE QUEUE MATCHMAKER GA CLUSTER CONNECTOR Titan CLASSADS CONDOR RESOURCES

  22. High Performance Systems Group Intra-Domain Co-Scheduling REQUESTS FROM USERS OR OTHER DOMAIN SCHEDULERS • Non-predictive tasks • Tasks with prediction data PORTAL PRE- EXECUTION ENGINE PACE SCHEDULE QUEUE MATCHMAKER GA CLUSTER CONNECTOR Titan CLASSADS CONDOR RESOURCES

  23. High Performance Systems Group Intra-Domain Deployment Without co-scheduler With co-scheduler Time to complete = 70.08m Time to complete = 35.19m

  24. High Performance Systems Group 2. Multi-Domain Management • Publish intra-domain perf. data through global information services (MDS) • Augment service with agent system • One agent per domain / VO • When a task is submitted • Agents query IS, and negotiate to discover best domain to run task • Scheme is tested on a 256-node exp. Grid • 16 resource domains; 6 arch. types

  25. High Performance Systems Group Multi-Domain Management time

  26. High Performance Systems Group Multi-Domain Management time

  27. High Performance Systems Group Multi-Domain Management time

  28. High Performance Systems Group Multi-Domain Management Time to complete = 2752s

  29. High Performance Systems Group Multi-Domain Management Time to complete = 467s; an improvement of 83%

  30. High Performance Systems Group Multi-Domain Management Time to complete = 467s; an improvement of 83%

  31. High Performance Systems Group QoS: Ability to Meet Deadline active inactive

  32. High Performance Systems Group Resource usage active inactive

  33. High Performance Systems Group Other work • OGSA compatibility • Prediction • Accuracy • Other prediction techniques • Workflow (CCGrid 2003) • Reservation • V. 1.1, Condor/GT2-based • www.dcs.warwick.ac.uk/~hpsg • Documented at HPDC-12/GGF-8, FGCS

More Related