170 likes | 205 Views
Fifth International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-06) Exact Solutions of Interactive POMDPs Using Behavioral Equivalence Speaker Prashant Doshi University of Georgia Authors B. Rathnasabapathy, Prashant Doshi, and Piotr Gmytrasiewicz. Overview.
E N D
Fifth International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-06) Exact Solutions of Interactive POMDPs Using Behavioral Equivalence Speaker Prashant Doshi University of Georgia Authors B. Rathnasabapathy, Prashant Doshi, and Piotr Gmytrasiewicz
Overview • I-POMDP – Framework for sequential decision making for an agent in a multi-agent setting • Takes the perspective of an individual in an interaction • Problem • Cardinality of the interactive state space → infinite • Other agent's models (incl. beliefs) are part of an agent's state space (interactive epistemology) • An algorithm for solving I-POMDPs exactly • Aggregate behaviorally equivalent models of other agents
Background – Properties of POMDPs and I-POMDPs • Finitely nested • Beliefs are nested up to a finite strategic level l • Level 0 models are POMDPs • Value function of POMDP and finitely nested I-POMDP is piecewise linear and convex (PWLC) • Agents’ behaviors in POMDP and finitely nested I-POMDP can be represented using policy trees
Interactive POMDPs • Definition • Interactive state space • S: set of physical states : set of intentional models : set of subintentional models • Intentional models contain the other agent’s beliefs
Example: Single-Agent Tiger Problem -100 +10 ? -1
P3 P1 P2 Behaviorally Equivalent Models Equivalence Classes of Beliefs
Equivalence Classes of Interactive States • Definition • Combination of a physical state and an equivalence class of models
Lossless Aggregation • In a finitely nested I-POMDP, a probability distribution over , provides a sufficient statistic for the past history of i’s observations • Transformation of the interactive state space into behavioral equivalence classes is value-preserving • Optimal policy of the transformed finitely nested I-POMDP remains unchanged
Solving I-POMDPs Exactly Procedure Solve-IPOMDP ( AGENTi, Belief Nesting L ) : Returns Policy If L = 0 Then Return { Policy : = Solve-POMDP ( AGENTi ) } Else For all AGENTj < > AGENTi Policyj : = Solve-IPOMDP( AGENTj , L-1) End Mj := Behavioral-Equivalence-Models(Policyj ) ECISi : = S x { xj Mj } Policy : = Modified-GIP(ECISi , Ai , Ti , Ωi , Oi , Ri ) Return Policy End
Multi-Agent Persistent-Tiger Problem -100 +10 {Growl Left, Growl Right} X {Creak Right, Creak Left, Silence}
Beliefs on ECIS Agent j’s policy
Agent i’s policy in the presence of another agent j Policy becomes diverse as i’s ability of observing j’s actions improves
Conclusions • A method that enables exact solution of finitely nested interactive POMDPs • Aggregate agent models into behavioral equivalence classes • Discretization is lossless • Interesting behaviors emerge in the multi-agent Tiger problem
Thank You and Please Stop by my Poster Questions