1 / 25

Grid Differentiated Services: a Reinforcement Learning Approach

Grid Differentiated Services: a Reinforcement Learning Approach. Julien Perez Laboratoire de Recherche en Informatique Université Paris-Sud / CNRS. Grids must offer “interactive” computing. Interactive: guaranteed low latency. Grids must offer “interactive” computing.

moke
Download Presentation

Grid Differentiated Services: a Reinforcement Learning Approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grid Differentiated Services: a Reinforcement Learning Approach Julien Perez Laboratoire de Recherche en Informatique Université Paris-Sud / CNRS

  2. Grids must offer “interactive” computing Interactive: guaranteed low latency CCGrid08

  3. Grids must offer “interactive” computing • To be more than a niche • Traditional demanding applications in physics, genomics,… • Clinical medical image analysis • Digital libraries: on-line complex queries • Disaster management, on-line instruments, • Ubiquitous computing and ambient intelligence • Industry products for clusters • MathWorks is porting the DCT on EGEE/gLite CCGrid08

  4. Institutional grids • Local control • Implicit scheduling policy as a result of the partially independent local decisions • Including fair share constraints • No time-slice CCGrid08

  5. An example challenge • Dynamically reallocating resources to classes • Typical classes: VOs x simulation (best effort) and analysis (QoS) • Current status: independent “pull” strategies (pilot jobs, glide-in,…) • Manual reconfiguration excluded • Automatic reconfiguration: they tried hard… • A case for Autonomic Computing Computing systems that manage themselves in accordance with high-level objectives from humans. (Kephart & Chess 2003) • Self-*: configuration, optimization, healing, protection CCGrid08

  6. 2003 2004 2005 2006 2007 2008 2009 2010 2011 GÉANT … upgrade GÉANT2 Grid infrastructures … reinforce … upgrade User involv. Data infrastruct. Supercomputer FP6 FP5 FP7 We need autonomic grids now! Ulf DAHLSTEN, European Commission, Information Society and Media Directorate-General, Directorate F - Emerging Technologies and Infrastructures CCGrid08

  7. Reinforcement learning situations • Components • A stationary environment. unknown but observable. • The available actions • Associated rewards • Goal: maximize the expected long term benefit • Method: discover the optimal action-value function through directed trial and error Agent p(s,a) Action at State st Reward rt+1 Environment Pss’aRss’a Rt=Sgkrt+k+1 Qp(s,a)= Ep(Rt|st=s,at=a) CCGrid08

  8. Scheduling: State and Actions • State: set of real (in R)variables measured in the system • Workload in the queue and in the cluster • VO distribution of jobs in the queue and the cluster • Resource status • No information about the arrival process • Action: job to schedule • Estimation of the action-value function Q • TD(0) temporal difference, with continuous estimation • Lookup tables would provide a poor approximation • Robust off-the shelf non-linear approximation: neural network (Rummery & Niranjan, 1994, Tesauro et al. 2007) • Re-training on each new example (vs. active learning) • Moving target: no guarantee of convergence • Off-line initial training with earliest deadline first policy CCGrid08

  9. Rewards model • Time/Utility functions (TUF) • Utility is a function of the execution delay • Service classes are associated to functions • Jensen et al 85, Tesauro & Kephart 2004, Vengerov 2007 Hard real-time Soft real-time Best effort t deadline Fixed Proportional a: start date d: relative deadline CCGrid08

  10. Rewards model • Fairness • Prescribed share w • Deficit distance D = maxk (wk –Sk)+ • Fairness utility 1- D/max(wk) • Policy if some VOs request less than their share • Fair excess allocation ? • “Greedy” allocation: use this slackness to favor responsiveness • The overall reward is the weighted sum of time utility and fairness utility CCGrid08

  11. Outline • Motivation for grid Differentiated Services • The reinforcement learning framework • Experimental setup and results CCGrid08

  12. Experimental setup • The simulation platform • Discrete event simulator • Plugin schedulers • Analysis tools • Matlab implementation • One step of the RL is 1 to 10 ms • Synthetic and real (EGEE) workloads CCGrid08

  13. Synthetic experiments More detail in the CCGrid’08 paper CCGrid08

  14. The EGEE workload • Torque logs of the LAL cluster 17-26 May 2006 • 100 servers (approximation) • VO = (.20, .12, .12, .06, .06, .09, .35) • Heavily dominated by short jobs • Jobs less than 15 mn are 62% of the total number of jobs, but less than 3% of the workload • An unknown proportion are SDJ • SDJ (Short Deadline Jobs) are executed immediately or rejected • Native scheduler: Maui/PBS with SDJ • The SDJ scheme cannot be outperformed • Challenge: get acceptable results for all interactive jobs CCGrid08

  15. Performance cdf of the waiting time Acceptable, but not competitive with the SDJ CCGrid08

  16. Performance CCGrid08

  17. Performance waiting time native – waiting time RL Slow learning CCGrid08

  18. Conclusion • Coping with the learning phase in unsteady systems: apprenticeship learning • Multi-objective multi-scale Reinforcement Learning CCGrid08

  19. Current estimate Error New estimate The classical temporal difference algorithm • Very naïve • TD(0) • On-policy: a* is the actual action • Exploration-exploitation: e-greedy Current estimate Target CCGrid08

  20. Temporal difference algorithm • TD(0) • Continuous estimation of Q() • Lookup tables would provide a poor approximation • Robust off-the shelf non-linear approximation: neural network (Rummery & Niranjan, 1994, Tesauro et al. 2007) • Re-training on each new example (vs. active learning) • Moving target: no guarantee of convergence • Off-line initial training with earliest deadline first policy CCGrid08

  21. Synthetic experiments • Load Parameters • Poisson arrival with parameter l • Execution time exponential with parameter m • Utilization factor r =l/m • Maximum duration of interactive jobs w • Proportion of interactive jobs q • Number of servers P • Fair Share parameters • Target fair share configuration • Actual distribution: the target may be feasible or not • Policies: FIFO and RL CCGrid08

  22. Performance: feasible schedule Interactive jobs: cdf of the waiting time for interactive jobs r =.99 P = 50 w = (.7, .2, .05, .05) Feasible schedule 5000 jobs More than 90% do no wait more than 2 minutes CCGrid08

  23. Performance: feasible schedule Mean and std of the waiting time RL does not starve batch jobs CCGrid08

  24. Performance: feasible schedule Dynamics of the fair share 3% off the optimum Reasonably fast convergence at the grid time scale (fairness-wise) CCGrid08

  25. Performance: unfeasible schedule Dynamics of the fair share Target w= (.7, .2, .05, .05) Actual w = (.4, .2, .2, .2) RL and FIFO very close to the optimum CCGrid08

More Related