290 likes | 426 Views
A Multi-Agent Learning Approach to Online Distributed Resource Allocation. Chongjie Zhang Victor Lesser Prashant Shenoy Computer Science Department University of Massachusetts Amherst. Focus.
E N D
A Multi-Agent Learning Approach to Online Distributed Resource Allocation Chongjie Zhang Victor Lesser PrashantShenoy Computer Science Department University of Massachusetts Amherst
Focus • This paper presents a multi-agent learning (MAL) approach to address resource sharing in cluster networks. • Exploit unknown task arrival patterns • Problem characteristics: • Realistic • Multiple agents • Partial observability • No global reward signal • Communication delay • Two interacting learning problems
Increasing Computing Demands • “Software as a service” is becomeing a popular IT business model. • Challenging to build large computing infrastructure to host such wide-spread online services.
A Potentially Cost-Effective Solution • Shared clusters • Built using commodity PCs or workstations. • Running the number of applications significantly larger than the number of nodes Resource manager Resource manager … … A dedicated cluster A shared cluster [Arpacidusseau and Culler, 1997; Aron et al., 2000; Urgaonkar and Shenoy 2003]
Building Larger, Scalable Computing Infrastructures • Centralized resource management limits the size of shared clusters. • Organizing shared clusters into a network and sharing resource across clusters. • How to efficiently share resources within a cluster network? Shared Cluster
Outline • Problem Formulation • Fair Action Learning Algorithm • Learning Distributed Resource Allocation • Local Allocation Decision • Task Routing Decision • Experimental Results • Summary
Problem Formulation • A distributed sequential resource allocation problem (DSRAP) is denoted as a tuple <C, A, T, R, B>: • C = {C1, …, Cm} is a set of agents (or clusters) • A = {aij}m x m is the adjacent matrix of agents and aijis the task transfer time from Ci to Cj • T = {t1, …, tl} is a set of task types • R = {R1, …, Rq} is a set of resource types • B ={Dij} l x m is the task arrival pattern and Dijis the arrival distribution of tasks of type tiat Cj
Problem Description: Cluster Network C4 a24 Cluster a46 C8 C6 a68 C2 a25 a56 a8, 10 a12 C5 a69 C10 a9, 10 C1 a79 C9 a35 a13 a37 C7 … Computing node C3 Resource R1 R2 R3
Problem Description: Task • A task is denoted as a tuple <t, u, w, d1, … dq>, where • t is the task type • u is the utility rate of the task • w is the maximum waiting time before being allocated • di is the demand for resource i = 1, …, q.
Problem Description: Task Type • A task type characterizes a set of tasks, each of whose feature components follows a common distribution. • A task type t is denoted as a tuple <Dts, Dtu, Dtw, Dtd1, … Dtdq>, where • Dtsis the task service time distribution • Dtuis the distribution of utility rate • Dtwis the distribution of the maximum waiting time • Dtdi is the distribution of the demand for resource i = 1, …, q.
Individual Agent’s Decision-Makings Task Set T Local Task Allocation Decision-Making Local Task Allocation Decision-Making Tasks to be allocated locally Tasks not allocated locally Local Resource Scheduling Task Routing Decision-Making Task Routing Decision-Making T4 T3 T2 Existing cluster resource scheduling algorithm
Problem Goal • The main goal is to derive decision policies for each agent that maximize the average utility rate (AUR) of the whole system. • Note that, due to its partial view of the system, each individual cluster can only observe its local utility rate, but not the system's utility rate.
Multi-Agent Reinforcement Learning (MARL) • In a multi-agent setting, all agents are concurrently learning their policies. • The environment becomes non-stationary from the perspective of an individual agent. • Single-agent reinforcement learning algorithms may diverge due to lack of synchronization. • Several MARL algorithms are proposed. • GIGA, GIGA-Wolf, WPL, etc.
Fair Action Learning (FAL) Algorithm • We usually don’t know the exact policy gradient used by GIGA in practical problems. • FAL is a direct policy search technique. • FAL is a variant of GIGA, using an easily-calculable, approximate policy gradient. Policy gradient GIGA’s normalization function
Individual Agent’s Decision-Makings Task Set T Local Task Allocation Decision-Making Local Task Allocation Decision-Making Tasks to be allocated locally Tasks not allocated locally Local Resource Scheduling Task Routing Decision-Making T4 T3 T2
Local Task Allocation Decision-Making selected := Ø • Select a subset of received tasks to be allocated locally to maximize its local utility rate. • Potentially improve the global utility rate. • Use an incrementally selecting algorithm. allocable := getAllocable(tasks) t := selectTask(Allocable) Yes t = nil No selected := selected {t} tasks := tasks \ {t} Return selected learn()
Learning Local Task Allocation • Learning model: • State: features describing both tasks to be allocated and availability of local resources • Action: selecting a task • Reward for selecting task a at state s • Due to partial observation, each agent uses FAL to learn a stochastic policy. • Q-learning is used to update the value function
Accelerating the Learning Process • Reasons • Extremely large policy search space • Non-stationary learning environment • Avoid poor initial policies in practical systems • Techniques • Initialize policies with a greedy allocation algorithm • Set utilization threshold for conducting ε-greedy exploration • Limit the exploration rate for selecting nil task
Individual Agent’s Decision-Makings Task Set T Local Task Allocation Decision-Making Tasks to be allocated locally Tasks not allocated locally Local Resource Scheduling Task Routing Decision-Making T4 T3 T2
Task Routing Decision-Making • To which neighbor should an agent forward an unallocated task to get it to an unsaturated cluster before it expires? • Learn to route tasks via interacting with its neighbors • The learning objective is to maximize the probability of each task to be allocated in the system. C4 C2 C6 Task C5 C1 C3
Learning Task Routing • State sx is defined by the characteristics of the current task x that an agent is forwarding. • An actionj corresponds to choosing neighbor j for forwarding a task. • Reward is the allocation probability of task x forwarded to neighbor j: Routing policy of j The probability that j allocates x locally The allocation probability of x forwarded by j
Learning Task Routing (cont.) • The local allocation probability • Qi(sx, j)is the expected probability that the task x will be allocated if an agent i forwards it to its neighbor j. • Q-learning is used to update the value function. • FAL is used to learn the task routing policy.
Dual Exploration r(sx, i) Qj(sx, i) x j i r(sx, j) Qi(sx, j) s d Forward Exploration Backward Exploration [Kumer and Miikkulainen, 1999]
Experiments: Compared Approaches • Distributed approaches: • A centralized approach • using best-first algorithm with a global view • ignore the communication delay • sometimes generating optimal allocation
Experimental Setup • Cluster network with heterogeneous clusters and heterogeneous computing nodes (total 1024 nodes) • Four types of tasks: ordinary, IO-intensive, compute-intensive, and demanding • Two task arrival patterns: light load and heavy load
Summary • This paper presents a multi-agent learning (MAL) approach to address resource sharing in cluster networks for building large computing infrastructure. • Experimental results are encouraging. • This work plausibly suggests that MAL may be a promising approach to online optimization problems in distributed systems.