1 / 29

Jorge G. Barbosa , Altino M. Sampaio , Hamid Harabnejad

Experiments on cost/power and failure aware scheduling for clouds and grids. Jorge G. Barbosa , Altino M. Sampaio , Hamid Harabnejad Universidade do Porto, Faculdade de Engenharia, LIACC Porto, Portugal, jbarbosa@fe.up.pt . Outline.

sammy
Download Presentation

Jorge G. Barbosa , Altino M. Sampaio , Hamid Harabnejad

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Experiments on cost/power and failure aware scheduling for clouds and grids Jorge G. Barbosa, Altino M. Sampaio, HamidHarabnejad Universidade do Porto, Faculdade de Engenharia, LIACC Porto, Portugal, jbarbosa@fe.up.pt

  2. Outline • Dynamic Power- and Failure-aware Cloud Resources Allocation for Sets of Independent Tasks • A Budget Constrained Scheduling Algorithm for Workflow Applications on Heterogeneous Clusters COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013

  3. Outline • Dynamic Power- and Failure-aware Cloud Resources Allocation for Sets of Independent Tasks • A Budget Constrained Scheduling Algorithm for Workflow Applications on Heterogeneous Clusters COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013

  4. Dynamic Power- and Failure-aware Cloud Resources Allocation for Sets of Independent Tasks • Cloud computing paradigm • Dynamic provisioning of computing services. • Employs Virtual Machine(VM) technologies for consolidation and environment isolation purposes. • Node failure can occur due to hardware or software problems. • Image source: http://www.commputation.kit.edu/92.php COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013

  5. Characteristics • Dependability of the infrastructure • Distributed systems continue to grow in scale and in complexity • Failures become norms, which can lead to violation of the negotiated SLAs • Mean Time Between Failures (MTBF) would be 1.25h on a petaflop system(1) • Energy consumption • The main part of energy consumption is determined by the CPU • Energy consumption dominates the operational costs Task n Task 1 Task 2 Task 3 VM 1 VM 2 VM 4 VM n ... VMM VMM VMM VMM PM 1 PM 2 PM 3 PM m PM – Physical Machine • (1) S. Fu, "Failure-aware resource management for high-availability computing clusters with distributed virtual machines," Journal of Parallel and Distributed Computing, vol. 70, April 2010, pp. 384-393, doi: 10.1016/j.jpdc.2010.01.002. COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013

  6. Related Work (1) Optimistic Best-Fit (OBFIT) algorithm - Selects the PM with minimum weighted available capacity and reliability. (2) Pessimistic Best-Fit (PBFIT) algorithm - Selects also unreliable PMs in order to increase the job completion rate. - Selects the unreliable PM p with capacity Cp such that Cavg + Cp results in the minimum required capacity • Dynamic allocation of VMs, considering PMs’ reliability • Based in a failure predictor tool with 76.5% of accuracy • Proposed architecture for reconfigurable distributed VM (1) • Cavg average capacity from reliable PMs. COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013

  7. Approach • The goal • It is a best-effort approach, not a SLA based approach; • Virtual-to-physical resources mapping decisions must consider both the power-efficiency and reliability levels of compute nodes; • Dynamic update of virtual-to-physical configurations (CPU usage and migration). • Construct power- and failure-aware computing environments, in order to maximize the rate of completed jobs by their deadline COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013

  8. Approach • Multi-objective scheduling algorithms are addressed in three ways: • 1- Finding the pareto optimal solutions, and let the user select the best solution. • 2- Combination of the two functions in a single objective function. • 3- Bicriteria scheduling which the user specifies a limitation for one criterion (power or budget constraints), and the algorithm tries to optimize the other criterion under this constraint. COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013

  9. Approach • Leverage virtualization tools • Xen credit scheduler • Dynamically update cap parameter • But enforcing work-conserving • Stop & copy migration • Faster VM migrations, preferable for proactive failure management Power consumption CPU% 100 CPU Increasing 0 PM3 VM time PM2 VM VM PM1 VM VM VM –Failure – Stop & copymigration –Failurepredictionaccuracy COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013

  10. System Overview • Cloud architecture • Private cloud • Homogenous PMs • Cluster coordinator manages user’ jobs • VMs are created and destroyed dynamically • Users’ jobs • A job is a set of independent tasks • A task runs in a single VM, which CPU-intensive workload is known • Number of tasks per joband tasks deadlines are defined by user • Private cloud management architecture COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013

  11. Power Model • Linear power model P = p1 + p2.CPU% • Power Efficiency of P • Completion rate of users’ jobs • Working Efficiency • Example of power efficiency curve (p1 = 175w, p2 = 75w) Measures the quantity of useful work done (i.e. completed users’ jobs) by the consumed power. COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013

  12. Proposed algorithms • Minimum Time Task Execution (MTTE) algorithm • Slack time to accomplish task t • PM icapacity constraints • Selects a PM if: • It guarantees maximum processing power required by the VM (task); • It has higher reliability; • And if It increases CPU Power Efficiency. COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013

  13. Proposed algorithms • Relaxed Time Task Execution (RTTE) algorithm 100% VM Host CPU 0% CAP • Cap set in Xen credit scheduler • Unlike MTTE, the RTTE algorithm always reserves to VM the minimum amount of resources necessary to accomplish the task within its deadline COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013

  14. Performance Analysis • Simulation setup • 50 PMs, each modeled with one CPU core with the performance equivalent to 800 MFLOPS; • VMs stop & copy migration overhead takes 12 secs; • 30 synthetic jobs, each being constituted of 5 CPU-intensive workload tasks; • Failed PMs stay unavailable during 60 secs; • Predicted occurrence time of failure precedes the actual occurrence time; • Failures instants, jobs arriving time, and tasks workload sizes follow an uniform distribution; COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013

  15. Performance Analysis • Implementation considerations • Stabilization to avoid multiple migrations • Concurrence among cluster coordinators • Algorithms compared to ours • Common Best-Fit (CBFIT) • Selects the PM with the maximum power-efficiency and do not consider resources reliability • Optimistic Best-Fit (OBFIT) • Pessimistic Best-Fit (PBFIT) COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013

  16. Performance Analysis • Migrations occurring due to proactive failure management only: • Failure predictor tool has 76.5% of accuracy; RTTE algorithm presents the best results; • Working efficiency, as well as the jobs completion rate, decreases with failure prediction inaccuracy. COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013

  17. Performance Analysis • Migrations occurring due to proactive failure management and power efficiency: • Sliding window of 36 seconds, with threshold of 65% (a migration starts if CPU usage below 65%); • RTTE returns the best results for 76.5% failure prediction accuracy; • Comparing to earlier results, the rate of completed jobs diminishes, since the number of VMs migrations increases. COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013

  18. Performance Analysis • Number of migrations occurring due to failure management and power efficiency • RTTE and MTTE have stable number of migrations and respawns along failure accuracy variation • Migrations occurring due to proactive failure management only (75% accuracy) • RTTE and MTTE return the best working efficiency as the number of failures in the cloud infrastructure rises COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013

  19. Conclusions (1) • Conclusion remarks: • Power- and failure-aware dynamic allocations improve the jobs completion rate; • Dynamically adjusting cap parameter of Xen credit scheduler prove to be capable of obtaining better jobs completion rate (RTTE); • Excessive number of VM migrations to optimizing power efficiency reduces job completion rate. • Future directions: • Dynamic allocation considering workload characteristics; • Data locality; • Scalability; • Compare/integrate DVFS feature; • Improve PM consolidation (why 65% threshold?); • Heterogeneous CPUs. COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013

  20. Outline • Dynamic Power- and Failure-aware Cloud Resources Allocation for Sets of Independent Tasks • A Budget Constrained Scheduling Algorithm for Workflow Applications on Heterogeneous Clusters COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013

  21. A Budget Constrained Scheduling Algorithm for Workflow Applications on Heterogeneous Clusters • A Job is represented by a workflow • A workflow is a Directed Acyclic Graph (DAG) a node is an individual task CPU1 CPU2 CPU3 an edge represents the inter-job dependency • Workflow scheduling • Mapping Tasks to Resources • Main goal is to have a lower finish time of the exit task COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013

  22. Introduction Target platform: - Utility Grids that are maintained and managed by a service provider. - Based on user requirements, the provider finds a scheduling that meets user constrains. In utility Grids, other QoS attributes than execution time, like economical cost or deadline, may be considered. It is a multi-objectiveproblem. Multi-objective scheduling algorithms are addressed in three ways: 1- Finding the pareto optimal solutions, and let the user select the best solution; 2- Combination of the two functions in a single objective function; 3- Bicriteria scheduling which the user specifies a limitation for one criterion (power or budget constraints), and the algorithm tries to optimize the other criterion under this constraint. COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013

  23. Proposed Algorithm Heterogeneous Budget Constraint Scheduling Algorithm (HBCS) • HBCS has two phases: • Task Selection Phase : • We use Upward rank to assign the priority to tasks in the DAG • Processor Selection Phase : • We combine both objective functions (cost and time) in a single function; the processor that maximizes that function for the current task is selected. COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013

  24. Proposed Algorithm Heterogeneous Budget Constraint Scheduling Algorithm (HBCS) 0<=k<= 1 (Objective function) COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013

  25. Experimental Result • Workflow Structure: • Synthetic DAG generation • (www.loria.fr/~suter/dags.html) • Applications have between 30 and 50 tasks, generated randomly. • Total number of DAGs in our simulation is 1000. • Workflow Budget: BUDGET = C cheapest + k (CHEFT – Ccheapest) 0<=k<= 1 Lower budget (k=0)  Cheapest scheduling, higher makespan Highest budget (k=1)  shortest makespan (HEFT scheduling) Performance Metric: COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013

  26. Experimental Result Simulation Platform : • We use SIMGRID that allows a realistic description of the infrastructure parameters. • We consider a bandwidth sharing policy; only one processor can send data over one network link at a time. • We consider nodes of clusters from the GRID’5000platform. COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013

  27. Results Shopia Rennes Grenoble HBCS Time complexity COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013

  28. Conclusions (2) • Conclusion remarks • We considered a realistic model of the infrastructure; • The HBCS algorithm achieves better performances, in particular for lower budget values (makespan and time complexity); • Future directions • Compare other combinations of cost and time factors in the objective function; • Data locality; • Multiple DAG scheduling. COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013

  29. Thank you! COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013

More Related