A Multi-Agent Learning Approach to Online Distributed Resource Allocation

A Multi-Agent Learning Approach to Online Distributed Resource Allocation Chongjie Zhang Victor Lesser PrashantShenoy Computer Science Department University of Massachusetts Amherst

Focus • This paper presents a multi-agent learning (MAL) approach to address resource sharing in cluster networks. • Exploit unknown task arrival patterns • Problem characteristics: • Realistic • Multiple agents • Partial observability • No global reward signal • Communication delay • Two interacting learning problems

Increasing Computing Demands • “Software as a service” is becomeing a popular IT business model. • Challenging to build large computing infrastructure to host such wide-spread online services.

A Potentially Cost-Effective Solution • Shared clusters • Built using commodity PCs or workstations. • Running the number of applications significantly larger than the number of nodes Resource manager Resource manager … … A dedicated cluster A shared cluster [Arpacidusseau and Culler, 1997; Aron et al., 2000; Urgaonkar and Shenoy 2003]

Building Larger, Scalable Computing Infrastructures • Centralized resource management limits the size of shared clusters. • Organizing shared clusters into a network and sharing resource across clusters. • How to efficiently share resources within a cluster network? Shared Cluster

Outline • Problem Formulation • Fair Action Learning Algorithm • Learning Distributed Resource Allocation • Local Allocation Decision • Task Routing Decision • Experimental Results • Summary

Problem Formulation • A distributed sequential resource allocation problem (DSRAP) is denoted as a tuple <C, A, T, R, B>: • C = {C1, …, Cm} is a set of agents (or clusters) • A = {aij}m x m is the adjacent matrix of agents and aijis the task transfer time from Ci to Cj • T = {t1, …, tl} is a set of task types • R = {R1, …, Rq} is a set of resource types • B ={Dij} l x m is the task arrival pattern and Dijis the arrival distribution of tasks of type tiat Cj

Problem Description: Cluster Network C4 a24 Cluster a46 C8 C6 a68 C2 a25 a56 a8, 10 a12 C5 a69 C10 a9, 10 C1 a79 C9 a35 a13 a37 C7 … Computing node C3 Resource R1 R2 R3

Problem Description: Task • A task is denoted as a tuple <t, u, w, d1, … dq>, where • t is the task type • u is the utility rate of the task • w is the maximum waiting time before being allocated • di is the demand for resource i = 1, …, q.

Problem Description: Task Type • A task type characterizes a set of tasks, each of whose feature components follows a common distribution. • A task type t is denoted as a tuple <Dts, Dtu, Dtw, Dtd1, … Dtdq>, where • Dtsis the task service time distribution • Dtuis the distribution of utility rate • Dtwis the distribution of the maximum waiting time • Dtdi is the distribution of the demand for resource i = 1, …, q.

Individual Agent’s Decision-Makings Task Set T Local Task Allocation Decision-Making Local Task Allocation Decision-Making Tasks to be allocated locally Tasks not allocated locally Local Resource Scheduling Task Routing Decision-Making Task Routing Decision-Making T4 T3 T2 Existing cluster resource scheduling algorithm

Problem Goal • The main goal is to derive decision policies for each agent that maximize the average utility rate (AUR) of the whole system. • Note that, due to its partial view of the system, each individual cluster can only observe its local utility rate, but not the system's utility rate.

Multi-Agent Reinforcement Learning (MARL) • In a multi-agent setting, all agents are concurrently learning their policies. • The environment becomes non-stationary from the perspective of an individual agent. • Single-agent reinforcement learning algorithms may diverge due to lack of synchronization. • Several MARL algorithms are proposed. • GIGA, GIGA-Wolf, WPL, etc.

Fair Action Learning (FAL) Algorithm • We usually don’t know the exact policy gradient used by GIGA in practical problems. • FAL is a direct policy search technique. • FAL is a variant of GIGA, using an easily-calculable, approximate policy gradient. Policy gradient GIGA’s normalization function

Individual Agent’s Decision-Makings Task Set T Local Task Allocation Decision-Making Local Task Allocation Decision-Making Tasks to be allocated locally Tasks not allocated locally Local Resource Scheduling Task Routing Decision-Making T4 T3 T2

Local Task Allocation Decision-Making selected := Ø • Select a subset of received tasks to be allocated locally to maximize its local utility rate. • Potentially improve the global utility rate. • Use an incrementally selecting algorithm. allocable := getAllocable(tasks) t := selectTask(Allocable) Yes t = nil No selected := selected {t} tasks := tasks \ {t} Return selected learn()

Learning Local Task Allocation • Learning model: • State: features describing both tasks to be allocated and availability of local resources • Action: selecting a task • Reward for selecting task a at state s • Due to partial observation, each agent uses FAL to learn a stochastic policy. • Q-learning is used to update the value function

Accelerating the Learning Process • Reasons • Extremely large policy search space • Non-stationary learning environment • Avoid poor initial policies in practical systems • Techniques • Initialize policies with a greedy allocation algorithm • Set utilization threshold for conducting ε-greedy exploration • Limit the exploration rate for selecting nil task

Individual Agent’s Decision-Makings Task Set T Local Task Allocation Decision-Making Tasks to be allocated locally Tasks not allocated locally Local Resource Scheduling Task Routing Decision-Making T4 T3 T2

Task Routing Decision-Making • To which neighbor should an agent forward an unallocated task to get it to an unsaturated cluster before it expires? • Learn to route tasks via interacting with its neighbors • The learning objective is to maximize the probability of each task to be allocated in the system. C4 C2 C6 Task C5 C1 C3

Learning Task Routing • State sx is defined by the characteristics of the current task x that an agent is forwarding. • An actionj corresponds to choosing neighbor j for forwarding a task. • Reward is the allocation probability of task x forwarded to neighbor j: Routing policy of j The probability that j allocates x locally The allocation probability of x forwarded by j

Learning Task Routing (cont.) • The local allocation probability • Qi(sx, j)is the expected probability that the task x will be allocated if an agent i forwards it to its neighbor j. • Q-learning is used to update the value function. • FAL is used to learn the task routing policy.

Dual Exploration r(sx, i) Qj(sx, i) x j i r(sx, j) Qi(sx, j) s d Forward Exploration Backward Exploration [Kumer and Miikkulainen, 1999]

Experiments: Compared Approaches • Distributed approaches: • A centralized approach • using best-first algorithm with a global view • ignore the communication delay • sometimes generating optimal allocation

Experimental Setup • Cluster network with heterogeneous clusters and heterogeneous computing nodes (total 1024 nodes) • Four types of tasks: ordinary, IO-intensive, compute-intensive, and demanding • Two task arrival patterns: light load and heavy load

Experimental Result: Light Load

Experimental Result: Heavy Load

Summary • This paper presents a multi-agent learning (MAL) approach to address resource sharing in cluster networks for building large computing infrastructure. • Experimental results are encouraging. • This work plausibly suggests that MAL may be a promising approach to online optimization problems in distributed systems.

A Multi-Agent Learning Approach to Online Distributed Resource Allocation

A Multi-Agent Learning Approach to Online Distributed Resource Allocation

Presentation Transcript

Multi-Agent Learning Mini-Tutorial

Distributed Intelligence: Multi-Agent Programming

Optimization and Distributed Algorithms for Resource Allocation in Multi-hop Wireless Networks

Resource Allocation

Multi-Agent Resource allocation

Multi-resource Allocation with Unknown Participants

Network Aware Resource Allocation in Distributed Clouds

Robust Distributed Task Allocation for Autonomous Multi-Agent Teams

Patient Journey Optimization using a Multi-agent Approach

Patient Journey Optimization using a Multi-agent approach

A Reinforcement Learning Approach to Dynamic Resource Allocation

Resource Allocation for Distributed Streaming Applications

Multi-Agent Systems A Modern Approach to Distributed Artificial Intelligence

Mutually-guided Multi-agent Learning

Game-Theoretic Multi-Agent Learning

Deployment of Distributed Multi-Agent Systems

A Unified Approach to Approximating Resource Allocation and Scheduling

A Multi-Agent-Approach

A Control Lyapunov Function Approach to Multi Agent Coordination

CS 194: Distributed Systems Resource Allocation

Multi-Agent Deep Reinforcement Learning

CS 194: Distributed Systems Resource Allocation