Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo joint work with Geoff Gordon, Jeff Schneider and Sebastian Thrun July 21, 2004 AAMAS 2004

Robot Teams Approximate Solutions for Partially Observable Stochastic Games with Common PayoffsRosemary Emery-Montemerlo

Robot Teams • With limited communication, existing paradigms for decentralized robot control are not sufficient • Game theoretic methods are necessary for multi-robot coordination under these conditions Approximate Solutions for Partially Observable Stochastic Games with Common PayoffsRosemary Emery-Montemerlo

Decentralized Decision Making Approximate Solutions for Partially Observable Stochastic Games with Common PayoffsRosemary Emery-Montemerlo

Decentralized Decision Making • A robot cannot choose actions based only on joint observations consistent with its own sensor readings • It must consider all joint observations that are consistent with its possible sensor readings Approximate Solutions for Partially Observable Stochastic Games with Common PayoffsRosemary Emery-Montemerlo

Relationship Between Decision Theoretic Models ? MDP POMDP State Space State Space Belief Space Belief Space Distribution over Belief Space Approximate Solutions for Partially Observable Stochastic Games with Common PayoffsRosemary Emery-Montemerlo

Models of Multi-Agent Systems • Partially observable stochastic games • Generalization of stochastic games to partially observable worlds • Related models • DEC-POMDP [Bernstein et al., 2000] • MTDP [Pynadath and Tambe, 2002] • I-POMDP [Gmystrasiewicz and Doshi, 2004] • POIPSG [Peshkin et al., 2000] Approximate Solutions for Partially Observable Stochastic Games with Common PayoffsRosemary Emery-Montemerlo

Partially Observable Stochastic Games • POSG = {I, S, A, Z, T, R, O} • I is the set of agents, I= {1,…,n} • S is the set of states • A is the set of actions, A= A1  An • Z is the set of observations, Z= Z1  Zn • T is the transition function, T: S  A  S • R is the reward function, R: S  A   • O are the observation emission probabilities O: S  Z  A  [0,1] Approximate Solutions for Partially Observable Stochastic Games with Common PayoffsRosemary Emery-Montemerlo

Solving POSGs • POSGs are computationally infeasible to solve Approximate Solutions for Partially Observable Stochastic Games with Common PayoffsRosemary Emery-Montemerlo

Solving POSGs • We can approximate a POSG as a series of smaller Bayesian games One-Step Lookahead Game at time t (Bayesian Game) Full POSG Approximate Solutions for Partially Observable Stochastic Games with Common PayoffsRosemary Emery-Montemerlo

Bayesian Games • Private information relevant to game • Uncertainty in utility • Type • Encapsulates private information • Will limit selves to games with finite number of types • In robot example • Type 1: Robot doesn’t see anything • Type 2: Robot sees intruder at location x Approximate Solutions for Partially Observable Stochastic Games with Common PayoffsRosemary Emery-Montemerlo

Bayesian Games • BG = {I, , A,p(), u} •  is the joint type space,  = 1  n •  is a specific joint type,  = {1,…, n} • p() is common prior on the distribution over  • u is the utility function, u= {u1,…,un} • ui(ai,a-i,(i, -i)) • i is a strategy for player i • Defines what player i does for each of its possible types • Actions are individual actions, not joint actions Approximate Solutions for Partially Observable Stochastic Games with Common PayoffsRosemary Emery-Montemerlo

Bayesian-Nash Equilibrium • Set of best response strategies • Each agent tries to maximize its expected utility conditioned on its probability distribution over the other agents’ types p() • Each agent has a policy i that, given -i , maximizes ui(i,-i, -i) Approximate Solutions for Partially Observable Stochastic Games with Common PayoffsRosemary Emery-Montemerlo

POSG to Bayesian Game Approximation • {I,S,A,Z,T,R,O} to {I, , A,p(), u}t • I = I • A = A • Type space it = all possible histories of agent i’s actions and observations up to time t • p()t calculated from S0,A,T,Z,O, t-1 • Prune low probability types • Each joint type  maps to a joint belief • u given by heuristic and ui = uj • QMDP Approximate Solutions for Partially Observable Stochastic Games with Common PayoffsRosemary Emery-Montemerlo

Agent i Agent j Initialize t=0, hi = {},p(0) 0=solveGame(0,p(0)) Initialize t=0, hj = {},p(0) 0=solveGame(0,p(0)) Make Observation hi = obsit U ait-1 U hi Make Observation hj = obsjt U ajt-1 U hj Determine Type it= bestMatch(hi,it) Determine Type jt= bestMatch(hj,2t) Execute Action ait= it(it) Execute Action ajt= jt(jt) Propagate Forward t+1,p(t+1) Propagate Forward t+1,p(t+1) Find Policy for t+1 t+1=solveGame(t,p(t)) t= t+1 Find Policy for t+1 t+1=solveGame(t,p(t)) t= t+1 Algorithm Approximate Solutions for Partially Observable Stochastic Games with Common PayoffsRosemary Emery-Montemerlo

Robotic Team Tag • Version of Team Tag • Environment is portion of Gates Hall • Full teammate observability • Opponent can be captured by a single robot in any state • QMDP used as heuristic • Two pioneer-class robots Approximate Solutions for Partially Observable Stochastic Games with Common PayoffsRosemary Emery-Montemerlo

Robot Policies Approximate Solutions for Partially Observable Stochastic Games with Common PayoffsRosemary Emery-Montemerlo

Lady And The Tiger [Nair et al. 2003] Approximate Solutions for Partially Observable Stochastic Games with Common PayoffsRosemary Emery-Montemerlo

Contributions • Algorithm for finding approximate solutions to POSG with common payoffs • Tractability achieved by modeling POSG as a sequence of Bayesian games • Performs comparably to the full POSG for a small finite-horizon problem • Improved performance over ‘blind’ application of utility heuristic in more complex problems • Successful real-time game-theoretic controller for indoor robots Approximate Solutions for Partially Observable Stochastic Games with Common PayoffsRosemary Emery-Montemerlo

Questions? • remery@cs.cmu.edu • www.cs.cmu.edu/~remery Approximate Solutions for Partially Observable Stochastic Games with Common PayoffsRosemary Emery-Montemerlo

Back-Up Slides Approximate Solutions for Partially Observable Stochastic Games with Common PayoffsRosemary Emery-Montemerlo

Lady And The Tiger [Nair et al. 2003] Approximate Solutions for Partially Observable Stochastic Games with Common PayoffsRosemary Emery-Montemerlo

Robotic Team Tag • I = {1,2} • S = S1 X S2 X Sopponent • Si = {s0,…,s28}, sopponent= {s0,…,s28,stagged} • |S| = 25230 • Ai = {N,S,E,W,Tag} • Zi = [{si,-1},s-i,a-i] • T: adjacent cells • O: see opponent if on same cell • R: minimize capture time • Modified from [Pineau et al. 2003] Approximate Solutions for Partially Observable Stochastic Games with Common PayoffsRosemary Emery-Montemerlo

Environment Approximate Solutions for Partially Observable Stochastic Games with Common PayoffsRosemary Emery-Montemerlo

Robotic Team Tag Results Approximate Solutions for Partially Observable Stochastic Games with Common PayoffsRosemary Emery-Montemerlo

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs