320 likes | 384 Views
This paper explores the impact of increased teamwork on multi-agent system outcomes, focusing on protocols and algorithms for distributed constraint optimization problems. The research delves into the complexities of teamwork in various domains, including meeting scheduling, traffic light coordination, and robotics. The study introduces the concept of "k-Optimality" and investigates the Distributed Coordination of Exploration and Exploitation (DCEE) algorithm, emphasizing the balance between exploration and exploitation for maximizing rewards in uncertain environments. Furthermore, the analysis of configuration hypercubes and the assessment of L-Movement provide insights into team coordination strategies. Overall, the study sheds light on the challenges and potential solutions for optimizing teamwork in multi-agent systems.
E N D
Lafayette College Towards a Theoretic Understanding of DCEEScott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe http://teamcore.usc.edu
Forward Pointer When Should There be a “Me” in “Team”? Distributed Multi-Agent Optimization Under Uncertainty Matthew E. Taylor, Manish Jain, Yanquin Jin, Makoto Yooko, & Milind Tambe Wednesday, 8:30 – 10:30 Coordination and Cooperation 1
Teamwork: Foundational MAS Concept • Joint actions improve outcome • But increases communication & computation • Over two decades of work • This paper: increased teamwork can harm team • Even without considering communication & computation • Only considering team reward • Multiple algorithms, multiple settings • But why?
DCOPs: Distributed Constraint Optimization Problems • Multiple domains • Meeting scheduling • Traffic light coordination • RoboCup soccer • Multi-agent plan coordination • Sensor networks • Distributed • Robust to failure • Scalable • (In)Complete • Quality bounds
DCOP Framework a1 a2 a3
DCOP Framework a1 a2 a3
DCOP Framework a1 a2 a3 Different “levels” of teamwork possible Complete Solution is NP-Hard
D-Cee: Distributed Coordination of Exploration and Exploitation • Environment may be unknown • Maximize on-line reward over some number of rounds • Exploration vs. Exploitation • Demonstrated mobile ad-hoc network • Simulation [Released] & Robots [Released Soon]
DCOP Distrubted Constraint Optimization Problem
DCOP → DCEE Distributed Coordination of Exploration and Exploitation
DCEE Algorithm: SE-Optimistic (Will build upon later) Rewards on [1,200] If I move, I’d get R=200 a1 a2 a3 a4 50 75 99
DCEE Algorithm: SE-Optimistic (Will build upon later) Rewards on [1,200] If I move, I’d gain 275 If I move, I’d gain 251 If I move, I’d gain 101 If I move, I’d gain 125 a1 a2 a3 a4 a3 50 75 99 Explore or Exploit?
Success! [ATSN-09][IJCAI-09] • Both classes of (incomplete) algorithms • Simulation and on Robots • Ad hoc Wireless Network (Improvement if performance > 0)
k-Optimality • Increased coordination – originally DCOP formulation • In DCOP, increased k = increased team reward • Find groups of agents to change variables • Joint actions • Neighbors of moving group cannot move • Defines amount of teamwork (Higher communication & computation overheads)
“k-Optimality” in DCEE • k=1, 2, ... • Groups of size k form, those with the most to gain move (change the value of their variable) • A group can only move if no other agents in its neighborhood move
Example: SE-Optimistic-2 Rewards on [1,200] If I move, I’d gain 275 If I move, I’d gain 251 If I move, I’d gain 101 If I move, I’d gain 125 a1 a2 a3 a4 50 75 99 275 + 250 - 150 200-99 251 + 275 - 150 101 + 251 - 101 125 + 275 - 125 a1 a4 a2 a2 a3 a3 99 50 75
Sample coordination results Omniscient: confirms DCOP result, as expected ! ! ? Artificially Supplied Rewards (DCOP) Complete Graph Chain Graph
Physical Implementation • Create Robots • Mobile ad-hoc Wireless Network
Confirms Team Uncertainty Penalty • Averaged over 10 trials each • Trend confirmed! • (Huge standard error) ! ! ? Total Gain Chain Complete
Problem with “k-Optimal” • Unknown rewards • cannot know if can increase reward by moving! • Define new term: L-Movement • # of agents that can change variables per round • Independent of exploration algorithm • Graph dependant • Alternate measure of teamwork
L-Movement • Example: k = 1 algorithms • L is the size of the largest maximal independent set of the graph • NP-hard to calculate for a general graph • harder for higher k • Consider ring & complete graphs, both with 5 vertices • ring graph: maximal independent set is 2 • complete graph: maximal independent set is 1 • For k =1 • L=1 for a complete graph • size of the maximal independent set of a ring graph is: General DCOP Analysis Tool?
Configuration Hypercube No (partial-)assignment is believed to be better than another wlog, agents can select next value when exploring Define configuration hypercube: C Each agent is a dimension is total reward when agent takes value cannot be calculated without exploration values drawn from known reward distribution Moving along an axis in hypercube → agent changing value Example: 3 agents (C is 3 dimensional) Changing from C[a, b, c] to C[a, b, c’] Agent A3 changes from c to c’
How many agents can move? (1/2) • In a ring graph with 5 nodes • k = 1 : L = 2 • k = 2 : L = 3 • In a complete graph with 5 nodes • k = 1 : L = 1 • k = 2 : L = 2
How many agents can move? (2/2) Configuration is reachable by an algorithm with movement L in s steps if an only if and C[2,2] reachable for L=1 if s ≥ 4
L-Movement Experiments For various DCEE problems, distributions, and L: For steps s = 1...30: • Construct hypercube with s values per dimension • Find M, the max achievable reward in s steps, given L • Return average of 50 runs Example: 2D Hypercube • Only half reachable if L=1 • All locations reachable if L=2 s s
Restricting to L-Movement: Complete L=1→2 Complete Graph • k = 1 : L = 1 • k = 2 : L = 2 Average Maximum Reward Discovered
Restricting to L-Movement: Ring L=2→3 Ring graph • k = 1 : L = 2 • k = 2 : L = 3 Average Maximum Reward Discovered
Ring Complete • Uniform distribution of rewards • 4 agents • Different normal distribution
k and L: 5-agent graphs • Increasing k changes L less in ring than complete • Configuration Hypercube is upper bound • Posit a consistent negative effect • Suggests why increasing k has different effects: • Larger improvement in complete than ring for increasing k
L-movement May Help Explain Team Uncertainty Penalty • L = 2 will be able to explore more of C than algorithm with L = 1 • Independent of exploration algorithm! • Determined by k and graph structure • C is upper bound – posit constant negative effect • Any algorithm experiences diminishing returns as k increases • Consistent with DCOP results • L-movement difference between k = 1 algorithms and k = 2 • Larger difference in graphs with more agents • For k = 1, L = 1 for a complete graph • For k = 1, L increases with the number of vertices in a ring graph
Thank you Towards a Theoretic Understanding of DCEEScott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe http://teamcore.usc.edu