On the Difficulty of Achieving Equilibrium in Interactive POMDPs

Twenty First National Conference on AI (AAAI 2006) On the Difficulty of Achieving Equilibrium in Interactive POMDPs Prashant Doshi Dept. of Computer Science University of Georgia Athens, GA 30602 Piotr J. Gmytrasiewicz Dept. of Computer Science University of Illinois at Chicago Chicago, IL 60607

Outline • Background on Interactive POMDPs • Subjective Equilibrium in I-POMDPs and Sufficient Conditions • Difficulty in Satisfying the Conditions

Interactive POMDPs • Background • Well-known framework for decision-making in single agent partially observable settings: POMDP • Traditional analysis of multiagent interactions: Game theory • Problem “... there is currently no good way to combine game theoretic and POMDP control strategies.” - Russell and Norvig AI: A Modern Approach, 2nd Ed.

observation action Beliefs Beliefs observation action Interactive POMDPs General Problem Setting Environment State Optimize an agent’s preferences given beliefs

Interactive POMDPs Key ideas: • Integrate game theoretic concepts into a decision theoretic framework • Include possible models of other agents in your decision making  intentional (types)and subintentional models • Address uncertainty by maintaining beliefs over the state and models of other agents  Bayesian learning • Beliefs over intentional models give rise to interactive belief systems  Interactive epistemology, recursive modeling • Computable approximation of the interactive belief system  Finitely nested belief system • Compute best responses to your beliefs  Subjective rationality

Interactive POMDPs • Interactive state space • Include models of other agents into the state space • Beliefs in I-POMDPs (computable)

Interactive POMDPs Formal Definition and Relevant Properties • Belief Update: The belief update function for I-POMDPi involves: • Use the other agent’s model to predict its action(s) • Anticipate the other agent’s observations and how it updates its model • Use your own observations to correct your beliefs Prediction: Correction: • Policy Computation • Analogously to POMDPs (given the new belief update)

Agents i & j Example Multiagent Tiger Problem Task: Maximize collection of gold over a finite or infinite number of steps while avoiding tiger Each agent hears growls (GL or GR) as well as creaks (S,CL, or CR) Each agent may open doors or listen (OL,OR, or L) Each agent is unable to perceive other’s observation

Subjective Equilibrium and Conditions for Achieving It

Subjective Equilibrium in I-POMDPs Theoretical Analysis: • Joint observation histories (paths of play) in the multiagent tiger problem

Subjective Equilibrium in I-POMDPs Agents i and j’s joint policies induce a true distribution over the future observation sequences True distribution over obs. histories Agent i’s beliefs over j’s models and its own policy induce a subjective distribution over the future observation sequences Subjective distribution over obs. histories

Subjective Equilibrium in I-POMDPs Absolute Continuity Condition (ACC) • Subjective distribution should not rule out the observation histories considered possible by the true distribution • Cautious beliefs  “Grain of truth” assumption • “Grain of truth” is sufficient but not necessary to satisfy the ACC

≤ Subjective Equilibrium in I-POMDPs • Proposition 1 (Convergence):Under ACC, an agent’s belief over other’s models updated using the I-POMDP belief update converges with probability 1 • Proof sketch: Show that Bayesian learning in I-POMDPs is a Martingale Apply the Martingale Convergence Theorem (Doob53) • -closeness of distributions:

Subjective Equilibrium in I-POMDPs • Lemma (Blackwell&Dubins62): For all agents, if their initial beliefs satisfy ACC, then after finite time T(), each of their beliefs are -close to the true distribution over the future observation paths • Subjective -Equilibrium (Kalai&Lehrer93): A profile of strategies of agents each of which is an exact best response to a belief that is -close to the true distribution over the observation history • Subjective equilibrium is stable under learning and optimization Prediction

Subjective Equilibrium in I-POMDPs Main Result • Proposition 2: If agents’ beliefs within the I-POMDP framework satisfy the ACC, then after finite time T, their strategies are in subjective -equilibrium, where  is a function of T • When  = 0, subjective equilibrium obtains • Proof follows from the convergence of the I-POMDP belief update and (Blackwell&Dubins62) • ACC is a sufficient condition, but not a necessary one

Difficulty in Practically Satisfying the Conditions

Computational Difficulties in Achieving Equilibrium • There exist computable strategies that admit no computable exact best responses (Nachbar&Zame96) • If possible strategies are assumed computable, then i’s best response may not be computable. Therefore, j’s cautious beliefs  grain of truth • Subtle tension between prediction and optimization • Strictness of ACC

Computational Difficulties in Achieving Equilibrium Proposition 3 (Impossibility): Within the finitely nested I-POMDP framework, all the agents’ beliefs will never simultaneously satisfy the grain of truth assumption • Difficult to realize the equilibrium!

Summary • Absolute Continuity Condition (ACC) • More realistic: “grain of truth” condition • Grain of truth condition is stronger than ACC • Equilibria in I-POMDPs • Theoretical convergence to subjective equilibrium given ACC • Strictness of ACC • Impossible to simultaneously satisfy grain of truth • Computational obstacles to satisfying ACC • Future Work: Investigate the connection between subjective equilibrium and Nash equilibrium

Thank You Questions

Introduction Significance: Real world applications • Robotics • Planetary exploration • Surface mapping by rovers • Coordinate to explore pre-defined region optimally Uncertainty due to sensors • Robot soccer • Coordinate with teammates and deceive opponents Anticipate and track others’ actions Spirit Opportunity RoboCup Competition

Interactive POMDPs Limitations of Nash Equilibrium • Not suitable for general control • Incomplete: Does not say what to do off-equilibria • Non-unique: Multiple solutions, no way to choose “…game theory has been used primarily to analyze environments that are at equilibrium, rather than to control agents within an environment.” - Russell and Norvig AI: A Modern Approach, 2nd Ed.

On the Difficulty of Achieving Equilibrium in Interactive POMDPs

On the Difficulty of Achieving Equilibrium in Interactive POMDPs

Presentation Transcript

The Registrar in Difficulty

On the Difficulty of Scalably Detecting Network Attacks

The Resident in Difficulty

The Difficulty of U.S. Neutrality

On Difficulty

The Equilibrium Condition, the Equilibrium Constant and Equilibrium in Terms of Pressures

Strategic Issues Achieving Financial Equilibrium

Residents In Difficulty

Trainees in Difficulty

The Trainee in Difficulty

POMDPs

Achieving Goals in Decentralized POMDPs

The Student in Difficulty

Learners in Difficulty

On the Difficulty of Some Shortest Paths Problems

Active Learning in POMDPs

Generalized Point Based Value Iteration for Interactive POMDPs

The Difficulty of Defining Drawing

Objectives: Reminder of AS work on equilibrium The equilibrium law The equilibrium constant.

The Registrar in Difficulty

POMDPs

Generalized Point Based Value Iteration for Interactive POMDPs