1 / 51

SProj 3

SProj 3. Libra: An Economy-Driven Cluster Scheduler Jahanzeb Sherwani Nosheen Ali Nausheen Lotia Zahra Hayat Project Advisor/Client: Rajkumar Buyya Faculty Advisor: Dr. Arif Zaman. Problem Statement. Implementing a computational-economy based user-centric scheduler for clusters.

salene
Download Presentation

SProj 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SProj 3 Libra: An Economy-Driven Cluster Scheduler Jahanzeb Sherwani Nosheen Ali Nausheen Lotia Zahra Hayat Project Advisor/Client: Rajkumar Buyya Faculty Advisor: Dr. Arif Zaman

  2. Problem Statement Implementing a computational-economy based user-centric scheduler for clusters

  3. What is a cluster? • A collection of workstations interconnected via a network technology, in order to take advantage of combined computational power and resources • An integrated collection of resources that can provide a single system image spanning all its nodes: a virtual supercomputer • Used for computation-intensive applications such as AI expert systems, nuclear simulations, and scientific calculations

  4. Why clusters? • Cost-effectiveness: low cost-performance ratio compared to a specialized supercomputer • Increase in workstation performance • Increase in network bandwidth • Decrease in network latency • Scalability higher than that of a specialized supercomputer • Easier to integrate into an existing network than specialized supercomputers

  5. Computational Economy • Traditional system-centric performance metrics • CPU Throughput • Mean Response Time • Shortest Job First • Computational economy is the inclusion of user-specified quality of service parameters with jobs so that resource management is user-centric rather than system-centric

  6. Computational Economy (cont’d) • Project focus: to implement a scheduler that aims to maximize user utility • Job parameters most relevant to user-centric scheduling • Budget allocated to job by user • Deadline specified by user

  7. Computational Economy for Grids • What is a grid? • An infrastructure that couples resources such as computers (workstations or clusters ), software (for special purpose applications) and devices (printers, scanners) across the Internet and presents them as a unified integrated single resource that can be widely used • How a grid differs from a cluster • Wide geographical area • Non-dedicated resources • No centralized resource management

  8. Computational Economy for Grids • Management of resources and scheduling computations in a grid environment is complex as the resources are • geographically distributed • heterogeneous in nature • owned by different individuals or organizations • have different access and cost models • resource discovery required • security issues • Computational economy has been implemented for grids: the Nimrod/G resource broker is a global resource management and scheduling system that supports deadline and economy-based computations in grid-computing environments

  9. Computational Economy for Clusters • Market-based Proportional Resource Sharing for Clusters: Brent Chun and David E. Culler, University of California at Berkeley, Computer Science Division • a market-based approach based on the notion of a computational economy which optimizes for user value. It describes an architecture for market-based cluster resource management based on the idea of proportional resource sharing of basic computing resources. Cluster nodes act as independent sellers of computing resources while user applications act as buyers who purchase resources . Users are allocated credits/tickets-the more tickets they have, the greater their CPU share. Ticket allocation is on the basis of the amount the user is willing to pay: his valuation of the job • Deadline not incorporated

  10. Cluster Architecture

  11. Cluster Management Software • Cluster Management Software is designed to administer and manage application jobs submitted to workstation clusters. • Creates a Single System Image • When a collection of interconnected computers appear to be a unified resource, we say it possesses a Single System Image • The benefit of a Single System Image is that the exact location of the execution of a process is entirely concealed from the user. The user is offered the illusion of a single powerful computer • Maintains centralized information about cluster status and resources

  12. Cluster Management Software • Commercial and Open-source Cluster Management Software • Open-source Cluster Management Software • DQS (Distributed Queuing System ) • CONDOR • GNQS (Generalized Network Queuing System) • MOSIX • REXEC (Remote Execution) • SGE (Sun Grid Engine) • PBS (Portable Batch System)

  13. Cluster Management Software • Why SGE was rejected • lack of online support • lack of stability • Final choice of CMS: PBS(Portable Batch System )

  14. Pricing the Cluster Resources • Cost= a (Job Execution Time) + b (Job Execution Time / Deadline) • Cost of using the cluster depends on job length and job deadline: the longer the user is prepared to wait for the results, the lower his cost • Cost formula forces user to reveal his true deadline

  15. Scheduling Algorithm • How to meet budget and deadline constraints? • Ensuring low run-time for the algorithm • Greedy Algorithm • Complex solutions unfeasible • Test run of algorithm: • 5 jobs, arriving at time t=0, 5, 7, 9, 9, on a 3 node cluster

  16. LIBRA with PBS • Portable Batch System (PBS) as the Cluster Management Software (CMS) • Robust, portable, effective, extensible batch job queuing and resource management system • Supports different schedulers • Job accounting • Technical Support

  17. Setting up the PBS Cluster • Installation of Linux with Windows • Installation of SGE as well as PBS • Setting up a Network File System • Configuring GridSim in Java • Configuring PBSWeb • Setting up the Apache WebServer • PHP scripting for Apache • Setting up PostgreSQL • Setting up SSH

  18. PBS Overview • Main components of PBS • Job Server pbs_server • Job Scheduler pbs_sched • Job Executor & Resource Monitor pbs_mom • The server accepts commands and communicates with the daemons • qsub - submit a job • qstat - view queue and job status • qalter - change job’s attributes • qdel - delete a job

  19. Xpbs – GUI for PBS

  20. Xpbs --- GUI for PBS

  21. Job Scheduling in PBS

  22. The Libra Scheduler • Default FIFO Scheduler in PBS • FIFO - sort jobs by job queuing time running the earliest job first • Fair share: sort & schedule jobs based on past usage of the machine by the job owners • Round-robin - pick a job from each queue • By key - sort jobs by a set of keys: shortest_job_first, smallest_memory_first

  23. The Libra Scheduler • Job Input Controller • Adding parameters at job submission time • deadline • budget • executionTime • Defining new attributes of job • Job Acceptance and Assignment Controller • Budget checked through cost function • Admission control through deadline scheduling • Execution host with the minimum load and ability to finish job on time selected • Equal Share instead of Minimum Share

  24. The Libra Scheduler • Job Execution Controller • Job run on the best node according to algorithm • Cluster and node status updated • runTime • cpuLoad • Job Querying Controller • Server, Scheduler, Exec Host, and Accounting Logs

  25. PBS-Libra Web --- Front-end for the Libra Engine

  26. PBS-Libra Web

  27. PBS-Libra Web

  28. PBS-Libra Web

  29. PBS-Libra Web

  30. PBS-Libra Web

  31. PBS-Libra Web

  32. PBS-Libra Web

  33. PBS-Libra Web

  34. Simulations • Goal: • Measure the performance of Libra Scheduler • Performance = ? • Maximize user satisfaction

  35. Simulations • Simulation Software • Alter GridSim (grid resource management simulation)

  36. GridSim Class Diagram

  37. Simulations • Methodology • Workload • 120 jobs with deadlines and budgets • Job lengths: 1000 to 10000 • Resources • 10 node, single processor (MIPS rating: 100) homogenous cluster

  38. Simulations • Assumptions • Strict deadlines • Ignores processing overhead due to scheduler and clock interrupt • Scheduler simulated as a function • Input: job size, deadline, budget • Output: accept/reject, node #, share allocated

  39. Simulations • Compared: • Proportional Share • FIFO • Experiments: • 120 jobs, 10 nodes • Increasing workload to 150 and 200 • Increasing cluster size to 20

  40. Simulation Results • 120 jobs, 20 did not meet budget

  41. 100 Jobs, 10 NodesFIFO: 23 rejected - Proportional Share: 14 rejected

  42. Simulation Results • Increase workload to 200 jobs on the same 10 node cluster

  43. 200 Jobs, 10 NodesFIFO: 105 rejected - Proportional Share: 93 rejected

  44. Simulation Results • Scale the cluster up to 20 nodes

  45. 200 Jobs, 20 NodesFIFO: 35 rejected - Proportional Share: 23 rejected

  46. Simulation Results

  47. Simulation Results

  48. Simulation Results

  49. Simulation Results

More Related