Bulding Practical Agent Teams: A hybrid perspective

SBIA 2004 Bulding Practical Agent Teams: A hybrid perspective Milind Tambe tambe@usc.edu Computer Science Dept University of Southern California Joint work with the TEAMCORE GROUP http://teamcore.usc.edu

Long-Term Research Goal • Building large-scale heterogeneous teams • Types of entities: Agents, people, sensors, resources, robots,.. • Scale: 1000s or more • Domains: Highly uncertain, real-time, dynamic • Activities: Form teams, persist for long durations, coordinate, adapt… • Some applications: Agent facilitated human orgs Large-scale disaster rescue Large area security

Domains and Motivations High Task & domain complexity Medium Low Large-scale heterogeneous Small-scale homogeneous Small-scale heterogeneous Team Scale & Complexity

Motivation: BDI+POMDP Hybrids TOP: Team plans, organizations, agents Execute Rescue [RAP team] Teamcore proxy Extinguish Fires Rescue civilians [Fire company] [Ambulance team] Team proxy Team proxy Clear Roads Extinguish • DistributedPOMDP approach • BDI approach Compute Optimal Policy using Distributed Partially Observable Markov Decision Processes (POMDPs) • Frameworks: Teamcore/Machinetta, GPGP,… • +ve: Ease of use for human developers; coordinate large-scale teams • -ve: Quantitative team evaluations difficult (given uncertainty/cost) • Frameworks: MTDP, DEC-MDP/DEC-POMDP, POIPSG,… • +ve: Quantitative of team performance evaluation easy (with uncertainty) • -ve: Scale-up difficult, difficult for human developers to program policies

BDI + POMDP Synergy Execute Rescue [RAP team] Teamcore proxy Extinguish Fires Rescue civilians [Fire company] [Ambulance] Team proxy Team proxy Clear Roads Extinguish Distributed POMDPs for TOP & proxy analysis and refinement • Combine “traditional” TOP approaches with distributed POMDPs • POMDPs improve TOP/proxies: E.g., Improve role allocation • TOP constrain POMDP policy search: Orders of magnitude speedup

Overall Research Framework Distributed POMDP Analysis: Multiagent Team Decision Problem (MTDP) (Nair et al 03b, Nair et al 04, Paruchuri et al 04) Role allocation algorithms Communication algorithms Teamwork proxy infrastructure

Electric Elves: 24/7 from 6/00 to 12/00(Chalupsky et al, IAAI’2001) Teamcore proxy Teamcore proxy Interest Matcher Scheduler agent Teamcore proxy Teamcore proxy Teamcore proxy Meet Maker • Reschedule meetings • Decide presenters • Order our meals Papers “ More & More computers are ordering food,…we need to think about marketing [to these computers]”local Subway owner

Modules within the Proxies: AA(Scerri, Pynadath and Tambe, JAIR’2002) Team-oriented Program a : Meeting Proxy algorithms Role: user arrives on time Communication Adjustable autonomy Role allocation Adj. Autonomy: MDPs for transfer-of-control policies Communication Role allocation • Reschedule • meetings Teamcore proxy • MDP Policies: Planned sequence of transfers of control, coordination changes • E.g., ADAH: Ask, delay, ask, cancel

Back to Hybrid BDI-POMDP Frameworks

Motivation: Communication in Proxies Proxy’s heuristic “BDI” communication rules example: RULE1(“joint intentions” {Levesque et al 90}): If (fact F  agent’s private state) ANDF matches goal of team’s plan AND (F  team state) Then possible communicative goal CG to communicate F RULE2: Ifpossible communicative goal CG AND (   miscoordination-cost > Communication-cost) Then Communicate CG

Motivation: Earlier BDI Evaluation Testing teamwork in RoboCup (Tambe et al, IJCAI’99) Testing Communication Selectivity (Pynadath & Tambe, JAAMAS’03) Helicopter domain • Quantiative analysis of optimality or complexity of optimal response difficult • Challenge in domains with significant uncertainty and costs

Distributed POMDPs Si STATE COM-MTDP (Pynadath and Tambe, 02) RMTDP (Nair, Tambe, Marsella 03) • S: states of the world (e.g., helicopter position, enemy position) • Ai: Actions (Communicate action, domain action ) • P: State transition probabilities • R: Reward; sub-divided based on action types

COM-MTDP: Analysis of Comunication Landmark1, Landmark2, E,NE… • W: observations (e.g., E enemy-on-radar, NE enemy-not-on-radar) • O: probability of observation given destination state & past action • B: Belief state (each Bi history of observations, messages) • Individual policies  :Bi i (Domain action)  :Bi i (Communication) • Goal: Find joint policies  and  maximize total expected reward Table per state, previous action STATE

Complexity Results in COM-MTDP Complexity: • Locally optimal solution (No global team optimality) • Hybrid approach: POMDP + BDI

Approach I: Locally Optimal Policy (Nair et al 03) • Repeat until convergence to local equilibrium, for each agent K: • Fix policy for all except agent K • Find optimal response policy for agent K Find optimal response policy for agent K, given fixed policies for others: • Problem becomes finding an optimal policy for a single agent POMDP • “Extended” state defined as not as • Define new transition function • Define new observation function • Define multiagent belief state • Dynamic programming over belief states • Significant speedup over exhaustive search, but problem size limited

II: Hybrid BDI + POMDP Feedback for modifying proxy communication algorithms pA : Fixed action policy p1p2p3 Vary Commun policies pOptimal Derive locally, globally optimal communication Policy Distributed POMDP Model (Exploit TOP) Team-oriented Program Domain COM-MTDP: Evaluate alternate communication policies Proxy algorithms Communication Adjustable autonomy Role allocation

Compare Communication Policiesover Different Domains TEAMCORE • Given domain, for different observability conditions & comm costs: • Evaluate Teamcore (rule1+rule2); Jennings, others, compare with optimal • Optimal: : O(|S|| W|)T

Distributed POMDPs to Analyze Role Allocations: RMTDP

Role Allocation: Illustration • Task: Move cargo from X to Y, large reward for cargo at destination • Three routes with varying length and failure rates • Scouts make a route safe for transports • Uncertainty: In actions and observations • Scouts may fail along a route (and transports may replace scouts) • Scouts failure rate decreases if more scouts to a route • Scouts’ failure may not be observable to transports

Team-Oriented Program Organization hierarchy Plan hierarchy • Best initial role allocation: How many helos in SctTeam A, B, C & Transport • TOP:Almost entire RMTDP policy is completely fixed • Policy gap only on step 1: Best role allocation in initial state for each agent • Assume six helicopter agents: 84 combinations (84 RMTDP policies)

Analyzing Role Allocation in Teamwork Feedback for specific role allocation in TOP pOptRole-taking Search policy space for optimal role-taking policy Distributed POMDP Model Team-oriented Program Domain R-MTDP: Evaluate alternate role-taking policies Role execution Policy Proxy algorithms Role allocation Adjustable autonomy Communication Fill in gaps In policies S2 S4 …. S1 ? S3 S5

RMTDP Policy Search: Efficiency Improvements • Belief-based policy evaluation • Not entire observation histories, only beliefs required by TOP • Form hierarchical policy groups for branch-&-bound search • Obtain upper bound on values of policies within a policy-group • If individual policies higher valued than a group, prune the group • Exploit TOP for generating policy groups, and for upper bounds E.g., history: T=1:<Scout1okay, Scout2fail>; T=2:<Scout1fail, Scout2fail> history: T=1:<Scout1okay, Scout2okay>; T=2:<Scout1fail, Scout2fail> E.g., T=2: <CriticalFailure>

MaxExp: Hierarchical Policy Groups 1926 2773 4167 3420 0 6 6 6 6 6 1 0 2 3 4 2 3 4 6 5 6 6 6 6 2 2 1 1 5 4 5 4 1 1 0 0 1 0 0 0 1 2 0 0 … …. 6 2926

MaxExp: Upperbound Policy Group Value [84] [3300] [36] DoScouting [Scout 2; Transport 4] DoTransport [Transport from previous] RemainScouts [Scout from previous] SafeRoute=2 Transport=4 6 Team-A =2 Team-B =0 Team-C =0 Transport =4 Team-A =1 Team-B =1 Team-C =0 Transport =4 SafeRoute=1 Transport=3 … 2 4 … 3420 • Obtain max for each component over all start states & observation histories • If each component independent: Can evaluate each separately • Dependence: Start of next component based on end state of previous • Why speedup: • No duplicate start states: multiple paths of previous component merge • No duplicate observation histories

Helicopter Domain: Computational Savings • NOPRUNE-OBS: No pruning, maintain full observation history • NOPRUNE: No pruning, maintain beliefs not observation histories • MAXEXP: Pruning using MAXEXP heuristic, using beliefs • NOFAIL: MAXEXP enhanced with “no failure” for quicker upper bound

Does RMTDP Improve Role Allocation?

RoboCup Rescue: Computational Savings

RoboCupRescue: RMTDP Improves Role Allocation

SUMMARY Team proxy Team proxy Team proxy COM-MTDP & R-MTDP: Distributed POMDPs for analysis • Combine “traditional” TOP approaches with distributed POMDPs • Exploit POMDPs to improve TOP/teamcore proxies • Exploit TOP to constrain POMDP policy search • Key policy evaluation complexity results TOP: Team plans organizations, agents

Future Work Trainee Agent-based Simulation technology Visualization

Thank You Contact: • Milind Tambe • tambe@usc.edu • http://teamcore.usc.edu/tambe • http://teamcore.usc.edu

Key Papers cited in this Presentation • Rajiv T. Maheswaran, Jonathan P. Pearce, and Milind Tambe. Distributed Algorithms for DCOP: A Graphical Game-Based Approach. Proceedings of the 17th International Conference on Parallel and Distributed Computing Systems (PDCS-2004). • Praveen Paruchuri, Milind Tambe, Fernando Ordonez, Sarit Kraus, Towards a formalization of teamwork with resource constraints, International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. • Ranjit Nair, Maayan Roth, Makoto Yokoo and Milind Tambe: "Communication for Improving Policy Computation in Distributed POMDPs". In Proceedings of The Third International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-04), 2004. (Post-script/PDF). • Rajiv T. Maheswaran, Milind Tambe, Emma Bowring, Jonathan P. Pearce, Pradeep Varakantham "Taking DCOP to the Real World : Efficient Complete Solutions for Distributed Event Scheduling". In Proceedings of the third International Joint Conference on Agents and Multi Agent Systems, AAMAS-2004. • Modi, P.J., Shen, W., Tambe, M., Yokoo, M. “Solving Distributed Constraint Optimization Problems Optimally, Efficiently and Asynchronously” Artificial Intelligence Journal (accepted) • D.V.Pynadath and M.Tambe. Automated teamwork among heterogeneous software agents and humans. Journal of Autonomous Agents and Multi-Agent Systems (JAAMAS). 7:71--100, 2003.** [pdf] ** • Nair, R., Tambe, M., Yokoo, M., Pynadath, D. and Marsella, S. Taming Decentralized POMDPs: Towards efficient policy computation for multiagent settings Proceedings of the International Joint conference on Artificial Intelligence (IJCAI), 2003 • Nair, R., Tambe, M., and Marsella, S. Role allocation and reallocation in multiagent teams: Towards a practical analysis Proceedings of the second International Joint conference on agents and multiagent systems (AAMAS), 2003 • Scerri, P., Johnson, L., Pynadath, D., Rosenbloom, P. Si, M., Schurr, N. and Tambe, M. A prototype infrastructure for distributed robot, agent, person teams Proceedings of the second International Joint conference on agents and multiagent systems (AAMAS), 2003 • Scerri, P. Pynadath, D. and Tambe, M. Towards adjustable autonomy for the real-world Journal of AI Research (JAIR), 2002, Volume 17, Pages 171-228 ** [pdf] ** • Pynadath, D. and Tambe, M. The communicative multiagent team decision problem: Analyzing teamwork theories and models Journal of AI Research (JAIR), 2002 • Kaminka, G., Pynadath, D. and Tambe, M. Monitoring teams by overhearing: A multiagent plan-recognition approach Journal of AI Research (JAIR), 2002 ** [pdf] **

All the Co-authors

Bulding Practical Agent Teams: A hybrid perspective