Grid Differentiated Services: a Reinforcement Learning Approach

Grid Differentiated Services: a Reinforcement Learning Approach Julien Perez Laboratoire de Recherche en Informatique Université Paris-Sud / CNRS

Grids must offer “interactive” computing Interactive: guaranteed low latency CCGrid08

Grids must offer “interactive” computing • To be more than a niche • Traditional demanding applications in physics, genomics,… • Clinical medical image analysis • Digital libraries: on-line complex queries • Disaster management, on-line instruments, • Ubiquitous computing and ambient intelligence • Industry products for clusters • MathWorks is porting the DCT on EGEE/gLite CCGrid08

Institutional grids • Local control • Implicit scheduling policy as a result of the partially independent local decisions • Including fair share constraints • No time-slice CCGrid08

An example challenge • Dynamically reallocating resources to classes • Typical classes: VOs x simulation (best effort) and analysis (QoS) • Current status: independent “pull” strategies (pilot jobs, glide-in,…) • Manual reconfiguration excluded • Automatic reconfiguration: they tried hard… • A case for Autonomic Computing Computing systems that manage themselves in accordance with high-level objectives from humans. (Kephart & Chess 2003) • Self-*: configuration, optimization, healing, protection CCGrid08

2003 2004 2005 2006 2007 2008 2009 2010 2011 GÉANT … upgrade GÉANT2 Grid infrastructures … reinforce … upgrade User involv. Data infrastruct. Supercomputer FP6 FP5 FP7 We need autonomic grids now! Ulf DAHLSTEN, European Commission, Information Society and Media Directorate-General, Directorate F - Emerging Technologies and Infrastructures CCGrid08

Reinforcement learning situations • Components • A stationary environment. unknown but observable. • The available actions • Associated rewards • Goal: maximize the expected long term benefit • Method: discover the optimal action-value function through directed trial and error Agent p(s,a) Action at State st Reward rt+1 Environment Pss’aRss’a Rt=Sgkrt+k+1 Qp(s,a)= Ep(Rt|st=s,at=a) CCGrid08

Scheduling: State and Actions • State: set of real (in R)variables measured in the system • Workload in the queue and in the cluster • VO distribution of jobs in the queue and the cluster • Resource status • No information about the arrival process • Action: job to schedule • Estimation of the action-value function Q • TD(0) temporal difference, with continuous estimation • Lookup tables would provide a poor approximation • Robust off-the shelf non-linear approximation: neural network (Rummery & Niranjan, 1994, Tesauro et al. 2007) • Re-training on each new example (vs. active learning) • Moving target: no guarantee of convergence • Off-line initial training with earliest deadline first policy CCGrid08

Rewards model • Time/Utility functions (TUF) • Utility is a function of the execution delay • Service classes are associated to functions • Jensen et al 85, Tesauro & Kephart 2004, Vengerov 2007 Hard real-time Soft real-time Best effort t deadline Fixed Proportional a: start date d: relative deadline CCGrid08

Rewards model • Fairness • Prescribed share w • Deficit distance D = maxk (wk –Sk)+ • Fairness utility 1- D/max(wk) • Policy if some VOs request less than their share • Fair excess allocation ? • “Greedy” allocation: use this slackness to favor responsiveness • The overall reward is the weighted sum of time utility and fairness utility CCGrid08

Outline • Motivation for grid Differentiated Services • The reinforcement learning framework • Experimental setup and results CCGrid08

Experimental setup • The simulation platform • Discrete event simulator • Plugin schedulers • Analysis tools • Matlab implementation • One step of the RL is 1 to 10 ms • Synthetic and real (EGEE) workloads CCGrid08

Synthetic experiments More detail in the CCGrid’08 paper CCGrid08

The EGEE workload • Torque logs of the LAL cluster 17-26 May 2006 • 100 servers (approximation) • VO = (.20, .12, .12, .06, .06, .09, .35) • Heavily dominated by short jobs • Jobs less than 15 mn are 62% of the total number of jobs, but less than 3% of the workload • An unknown proportion are SDJ • SDJ (Short Deadline Jobs) are executed immediately or rejected • Native scheduler: Maui/PBS with SDJ • The SDJ scheme cannot be outperformed • Challenge: get acceptable results for all interactive jobs CCGrid08

Performance cdf of the waiting time Acceptable, but not competitive with the SDJ CCGrid08

Performance CCGrid08

Performance waiting time native – waiting time RL Slow learning CCGrid08

Conclusion • Coping with the learning phase in unsteady systems: apprenticeship learning • Multi-objective multi-scale Reinforcement Learning CCGrid08

Current estimate Error New estimate The classical temporal difference algorithm • Very naïve • TD(0) • On-policy: a* is the actual action • Exploration-exploitation: e-greedy Current estimate Target CCGrid08

Temporal difference algorithm • TD(0) • Continuous estimation of Q() • Lookup tables would provide a poor approximation • Robust off-the shelf non-linear approximation: neural network (Rummery & Niranjan, 1994, Tesauro et al. 2007) • Re-training on each new example (vs. active learning) • Moving target: no guarantee of convergence • Off-line initial training with earliest deadline first policy CCGrid08

Synthetic experiments • Load Parameters • Poisson arrival with parameter l • Execution time exponential with parameter m • Utilization factor r =l/m • Maximum duration of interactive jobs w • Proportion of interactive jobs q • Number of servers P • Fair Share parameters • Target fair share configuration • Actual distribution: the target may be feasible or not • Policies: FIFO and RL CCGrid08

Performance: feasible schedule Interactive jobs: cdf of the waiting time for interactive jobs r =.99 P = 50 w = (.7, .2, .05, .05) Feasible schedule 5000 jobs More than 90% do no wait more than 2 minutes CCGrid08

Performance: feasible schedule Mean and std of the waiting time RL does not starve batch jobs CCGrid08

Performance: feasible schedule Dynamics of the fair share 3% off the optimum Reasonably fast convergence at the grid time scale (fairness-wise) CCGrid08

Performance: unfeasible schedule Dynamics of the fair share Target w= (.7, .2, .05, .05) Actual w = (.4, .2, .2, .2) RL and FIFO very close to the optimum CCGrid08

Grid Differentiated Services: a Reinforcement Learning Approach

Grid Differentiated Services: a Reinforcement Learning Approach

Presentation Transcript

Reinforcement Learning: A Tutorial

Learning structured ouputs A reinforcement learning approach

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

A Reinforcement Learning Approach to Dynamic Resource Allocation

REINFORCEMENT LEARNING

Reinforcement Learning: A survey

Coordination in Multiagent Reinforcement Learning: A Bayesian Approach

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning: A Tutorial

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning