Reinforcement Learning applied to Meta -scheduling in grid environments Bernardo Costa Inês Dutra

ISPA 2008 APDCT Workshop Reinforcement Learning applied to Meta-scheduling in grid environments Bernardo Costa Inês Dutra Marta Mattoso

ISPA 2008 APDCT Workshop Outline • Introduction • Algorithms • Experiments • Conclusions and Future work

ISPA 2008 APDCT Workshop • Introduction • Algorithms • Experiments • Conclusions and Future work

ISPA 2008 APDCT Workshop Introduction • Relevance: • Available grid schedulers usually do not employ a strategy that may benefit a single or multiple users. • Some strategies employ performance information dependent algorithms (pida). • Most works are simulated. • Difficulty: monitoring information not reliable due to network latency.

ISPA 2008 APDCT Workshop Study of 2 Algorithms • (AG) A. Galstyan, K. Czajkowski, and K. Lerman. Resource allocation in the grid using reinforcement learning. In AAMAS, pages 1314–1315. IEEE, 2004. • (MQD) Y. C. Lee and A. Y. Zomaya. A grid scheduling algorithm for bag-of-tasks applications using multiple queues with duplication. 5th IEEE/ACIS International Conference on Computer and Information Science and 1st IEEE/ACIS International Workshop on Component-Based Software Engineering, Software Architecture and Reuse. ICIS-COMSAR, pages 5–10, 2006.

ISPA 2008 APDCT Workshop What is reinforcement learning? • Machine learning technique used to learn behaviours given a series of temporal events. • Non-supervised learning. • Based on the idea of rewards and punishments.

ISPA 2008 APDCT Workshop Algorithms • AG and MQD use reinforcement learning to associate an efficiency rank to an RMS. • Reinforcement learning native to AG. • MQD was modified to use this technique to estimate computational power of an RMS. • AG allocates RMS in a greedy and probabilistic way. • MQD allocates RMS associatively and deterministically.

ISPA 2008 APDCT Workshop Algorithms • Calculating efficiency: • Reward is assigned to RMS that has performance better than average. • Reward can be negative (punishment). • RMS may not change its efficiency value.

ISPA 2008 APDCT Workshop Algorithms • Calculating efficiency: • parameters: a and l • a is the importance of the time spent executing a task • affects rewarding. • l is a learning parameter

ISPA 2008 APDCT Workshop Algorithms • AG: • With high prob, associates job to the best available RMS, otherwise, selects randomly. • MQD: • Groups of jobs sorted according execution time are associated to an RMS. Most efficient executes the heaviest jobs. Initial allocation to estimate RMS´ efficiency

ISPA 2008 APDCT Workshop Algorithm AG

ISPA 2008 APDCT Workshop R1 E = 0 R2 E = 0 R3 E = 0 J2 J3 J4 J1 J5 J6 J8 J9 J7

ISPA 2008 APDCT Workshop R1 E = 0 R2 E = 0,3 R3 E = -0,3 J4 J5 J6 J8 J7 J9

ISPA 2008 APDCT Workshop R1 E = 0,3 R2 E = 0,057 R3 E = 0,51 J8 J7 J9

ISPA 2008 APDCT Workshop Algorithm MQD

ISPA 2008 APDCT Workshop R1 E = 0 R2 E = 0 R3 E = 0 J2 15 J3 50 J4 30 J1 40 J5 10 J6 70 J8 20 J7 20 J9 40

ISPA 2008 APDCT Workshop R1 E = 0 R2 E = 0 R3 E = 0 J5 10 J2 15 J7 20 J8 20 J4 30 J9 40 J6 70 J3 50 J1 40

ISPA 2008 APDCT Workshop R1 E = 0,3 R2 E = -0,3 R3 E = 0 J8 20 J4 30 J9 40 J6 70 J3 50 J1 40

ISPA 2008 APDCT Workshop R1 E = 0,09 R2 E = -0,09 R3 E = -0,3 J8 20 J3 50 J1 40

ISPA 2008 APDCT Workshop Avg per proc Global Avg

ISPA 2008 APDCT Workshop

ISPA 2008 APDCT Workshop Experiments • GridbusBroker: • No need to install it in other grid sites • Only requirement: ssh access to a grid node • Round-robin scheduler (RR) • Limitations: • Does not support job duplication • Imposes a limit on the number of active jobs per RMS

ISPA 2008 APDCT Workshop Experiments • Resources in 6 grid sites: • LabIA: 24 (Torque/Maui) • LCP: 28 (SGE) • Nacad: 16 (PBS PRO) • UERJ: 144 (Condor) • UFRGS: 4 (Torque) • LCC: 44 (Torque)

ISPA 2008 APDCT Workshop Experiments • Objective: study performance of algorithms in a real grid environment. • Application: bag-of-tasks. • CPU intensive. • Duration between 3 and 8 minutes.

ISPA 2008 APDCT Workshop Experiments • Evaluation criteria: • makespan. • Makespan was normalized with respect to RR

ISPA 2008 APDCT Workshop Experiments • Phase I: • Tuning of parameters a and l • 500 jobs. • Phase II: • Performance of re-scheduling. • Later load increased to 1000 jobs.

ISPA 2008 APDCT Workshop Experiments • One experiment is a run of consecutive executions of RR, AG and MQD. • A scenario is a set of experiments with fixed parameters. • For each scenario: 15 runs. • T-tests to verify statistical difference beteween AG/MQD e RR, with 95% confidence (the results have a normal distribution).

ISPA 2008 APDCT Workshop Experiments (Phase I)

ISPA 2008 APDCT Workshop Experiments (Phase II)

ISPA 2008 APDCT Workshop Conclusions and Future work • Results showed that was possible to achieve optimizations with both AG and MQD wrt RR • Experiments validate MQD simulation results found in the literature. • Reinforcement learning is a promising technique to classify resources in real grid environments.

ISPA 2008 APDCT Workshop Conclusions and Future work • Study the behavior of AG and MQD with other kinds of applications, e.g., data intensive, with dependencies.

ISPA 2008 APDCT Workshop Questions?

ISPA 2008 APDCT Workshop Annex

ISPA 2008 APDCT Workshop Definições • Gerenciador de recursos: sistema que gerencia a submissão e execução de jobs dentro de um domínio específico. • Resource Management System (RMS): sinônimo para gerenciador de recursos. • Batch job scheduler: escalonador típico de um RMS. Ex: SGE, PBS/Torque.

ISPA 2008 APDCT Workshop Definições • Meta-escalonador: um escalonador que não tem acesso direto aos recursos, mas apenas aos RMS que os gerenciam. • Aprendizado por reforço: técnica que induz um agente a tomar decisões por meio de recompensas oferecidas. • Makespan: tempo total gasto por um meta-escalonador para finalizar a execução de um conjunto de jobs a ele designado.

ISPA 2008 APDCT Workshop Definições • Job: aplicativo submetido ao grid por um usuário, executado em geral por um RMS. Exemplos de tipos de jobs: • Bag-of-Tasks: jobs que não possuem relação de dependência ou precedência explícita entre si. • Troca de parâmetros (APST): jobs de um mesmo executável que diferenciam-se por um valor de entrada que varia entre as execuções.

Reinforcement Learning applied to Meta -scheduling in grid environments Bernardo Costa Inês Dutra

Reinforcement Learning applied to Meta -scheduling in grid environments Bernardo Costa Inês Dutra

Presentation Transcript

Chapter 4 Fostering Learning and Reinforcement

Reinforcement Learning

Information-Driven Science in Pervasive Grid Environments

Introduction

Evolutionary Algorithms for Reinforcement Learning

Introduction to Hierarchical Reinforcement Learning

Efficient Implementation of Reinforcement Learning In Co-ordinated Group Activities

Relational Transfer in Reinforcement Learning

Reinforcement Learning: How far can it Go?

Negative Reinforcement Versus Punishment

Reinforcement Learning

The GridWay Meta-scheduler

Delivering Grid in commercial environments: The GridSystems’ experience

Heuristics for Meta Scheduling

Reinforcement Learning

Evaluating Meta-Scheduling Algorithms in VLAM-G Environment V.Korkhov, A.Belloum, L.O.Hertzberger

Grid Programming Environments

Reinforcement Learning (RL)

Reinforcement Learning (RL)

Web Services

Reinforcement Learning : Dynamic Programming