220 likes | 233 Views
Explore the complexity of achieving equilibrium in interactive Partially Observable Markov Decision Processes (I-POMDPs). Learn about subjective equilibrium, belief updates, policy computation, and practical challenges in decision-making scenarios involving multiple agents. Understand the formal definitions, properties, and conditions for achieving equilibrium in interactive settings. Discover key concepts such as interactive state space, Bayesian learning, and subjective rationality.
E N D
Twenty First National Conference on AI (AAAI 2006) On the Difficulty of Achieving Equilibrium in Interactive POMDPs Prashant Doshi Dept. of Computer Science University of Georgia Athens, GA 30602 Piotr J. Gmytrasiewicz Dept. of Computer Science University of Illinois at Chicago Chicago, IL 60607
Outline • Background on Interactive POMDPs • Subjective Equilibrium in I-POMDPs and Sufficient Conditions • Difficulty in Satisfying the Conditions
Interactive POMDPs • Background • Well-known framework for decision-making in single agent partially observable settings: POMDP • Traditional analysis of multiagent interactions: Game theory • Problem “... there is currently no good way to combine game theoretic and POMDP control strategies.” - Russell and Norvig AI: A Modern Approach, 2nd Ed.
observation action Beliefs Beliefs observation action Interactive POMDPs General Problem Setting Environment State Optimize an agent’s preferences given beliefs
Interactive POMDPs Key ideas: • Integrate game theoretic concepts into a decision theoretic framework • Include possible models of other agents in your decision making intentional (types)and subintentional models • Address uncertainty by maintaining beliefs over the state and models of other agents Bayesian learning • Beliefs over intentional models give rise to interactive belief systems Interactive epistemology, recursive modeling • Computable approximation of the interactive belief system Finitely nested belief system • Compute best responses to your beliefs Subjective rationality
Interactive POMDPs • Interactive state space • Include models of other agents into the state space • Beliefs in I-POMDPs (computable)
Interactive POMDPs Formal Definition and Relevant Properties • Belief Update: The belief update function for I-POMDPi involves: • Use the other agent’s model to predict its action(s) • Anticipate the other agent’s observations and how it updates its model • Use your own observations to correct your beliefs Prediction: Correction: • Policy Computation • Analogously to POMDPs (given the new belief update)
Agents i & j Example Multiagent Tiger Problem Task: Maximize collection of gold over a finite or infinite number of steps while avoiding tiger Each agent hears growls (GL or GR) as well as creaks (S,CL, or CR) Each agent may open doors or listen (OL,OR, or L) Each agent is unable to perceive other’s observation
Subjective Equilibrium in I-POMDPs Theoretical Analysis: • Joint observation histories (paths of play) in the multiagent tiger problem
Subjective Equilibrium in I-POMDPs Agents i and j’s joint policies induce a true distribution over the future observation sequences True distribution over obs. histories Agent i’s beliefs over j’s models and its own policy induce a subjective distribution over the future observation sequences Subjective distribution over obs. histories
Subjective Equilibrium in I-POMDPs Absolute Continuity Condition (ACC) • Subjective distribution should not rule out the observation histories considered possible by the true distribution • Cautious beliefs “Grain of truth” assumption • “Grain of truth” is sufficient but not necessary to satisfy the ACC
≤ Subjective Equilibrium in I-POMDPs • Proposition 1 (Convergence):Under ACC, an agent’s belief over other’s models updated using the I-POMDP belief update converges with probability 1 • Proof sketch: Show that Bayesian learning in I-POMDPs is a Martingale Apply the Martingale Convergence Theorem (Doob53) • -closeness of distributions:
Subjective Equilibrium in I-POMDPs • Lemma (Blackwell&Dubins62): For all agents, if their initial beliefs satisfy ACC, then after finite time T(), each of their beliefs are -close to the true distribution over the future observation paths • Subjective -Equilibrium (Kalai&Lehrer93): A profile of strategies of agents each of which is an exact best response to a belief that is -close to the true distribution over the observation history • Subjective equilibrium is stable under learning and optimization Prediction
Subjective Equilibrium in I-POMDPs Main Result • Proposition 2: If agents’ beliefs within the I-POMDP framework satisfy the ACC, then after finite time T, their strategies are in subjective -equilibrium, where is a function of T • When = 0, subjective equilibrium obtains • Proof follows from the convergence of the I-POMDP belief update and (Blackwell&Dubins62) • ACC is a sufficient condition, but not a necessary one
Computational Difficulties in Achieving Equilibrium • There exist computable strategies that admit no computable exact best responses (Nachbar&Zame96) • If possible strategies are assumed computable, then i’s best response may not be computable. Therefore, j’s cautious beliefs grain of truth • Subtle tension between prediction and optimization • Strictness of ACC
Computational Difficulties in Achieving Equilibrium Proposition 3 (Impossibility): Within the finitely nested I-POMDP framework, all the agents’ beliefs will never simultaneously satisfy the grain of truth assumption • Difficult to realize the equilibrium!
Summary • Absolute Continuity Condition (ACC) • More realistic: “grain of truth” condition • Grain of truth condition is stronger than ACC • Equilibria in I-POMDPs • Theoretical convergence to subjective equilibrium given ACC • Strictness of ACC • Impossible to simultaneously satisfy grain of truth • Computational obstacles to satisfying ACC • Future Work: Investigate the connection between subjective equilibrium and Nash equilibrium
Thank You Questions
Introduction Significance: Real world applications • Robotics • Planetary exploration • Surface mapping by rovers • Coordinate to explore pre-defined region optimally Uncertainty due to sensors • Robot soccer • Coordinate with teammates and deceive opponents Anticipate and track others’ actions Spirit Opportunity RoboCup Competition
Interactive POMDPs Limitations of Nash Equilibrium • Not suitable for general control • Incomplete: Does not say what to do off-equilibria • Non-unique: Multiple solutions, no way to choose “…game theory has been used primarily to analyze environments that are at equilibrium, rather than to control agents within an environment.” - Russell and Norvig AI: A Modern Approach, 2nd Ed.