1 / 22

On the Difficulty of Achieving Equilibrium in Interactive POMDPs

Twenty First National Conference on AI (AAAI 2006). On the Difficulty of Achieving Equilibrium in Interactive POMDPs. Prashant Doshi Dept. of Computer Science University of Georgia Athens, GA 30602. Piotr J. Gmytrasiewicz Dept. of Computer Science University of Illinois at Chicago

estratton
Download Presentation

On the Difficulty of Achieving Equilibrium in Interactive POMDPs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Twenty First National Conference on AI (AAAI 2006) On the Difficulty of Achieving Equilibrium in Interactive POMDPs Prashant Doshi Dept. of Computer Science University of Georgia Athens, GA 30602 Piotr J. Gmytrasiewicz Dept. of Computer Science University of Illinois at Chicago Chicago, IL 60607

  2. Outline • Background on Interactive POMDPs • Subjective Equilibrium in I-POMDPs and Sufficient Conditions • Difficulty in Satisfying the Conditions

  3. Interactive POMDPs • Background • Well-known framework for decision-making in single agent partially observable settings: POMDP • Traditional analysis of multiagent interactions: Game theory • Problem “... there is currently no good way to combine game theoretic and POMDP control strategies.” - Russell and Norvig AI: A Modern Approach, 2nd Ed.

  4. observation action Beliefs Beliefs observation action Interactive POMDPs General Problem Setting Environment State Optimize an agent’s preferences given beliefs

  5. Interactive POMDPs Key ideas: • Integrate game theoretic concepts into a decision theoretic framework • Include possible models of other agents in your decision making  intentional (types)and subintentional models • Address uncertainty by maintaining beliefs over the state and models of other agents  Bayesian learning • Beliefs over intentional models give rise to interactive belief systems  Interactive epistemology, recursive modeling • Computable approximation of the interactive belief system  Finitely nested belief system • Compute best responses to your beliefs  Subjective rationality

  6. Interactive POMDPs • Interactive state space • Include models of other agents into the state space • Beliefs in I-POMDPs (computable)

  7. Interactive POMDPs Formal Definition and Relevant Properties • Belief Update: The belief update function for I-POMDPi involves: • Use the other agent’s model to predict its action(s) • Anticipate the other agent’s observations and how it updates its model • Use your own observations to correct your beliefs Prediction: Correction: • Policy Computation • Analogously to POMDPs (given the new belief update)

  8. Agents i & j Example Multiagent Tiger Problem Task: Maximize collection of gold over a finite or infinite number of steps while avoiding tiger Each agent hears growls (GL or GR) as well as creaks (S,CL, or CR) Each agent may open doors or listen (OL,OR, or L) Each agent is unable to perceive other’s observation

  9. Subjective Equilibrium and Conditions for Achieving It

  10. Subjective Equilibrium in I-POMDPs Theoretical Analysis: • Joint observation histories (paths of play) in the multiagent tiger problem

  11. Subjective Equilibrium in I-POMDPs Agents i and j’s joint policies induce a true distribution over the future observation sequences True distribution over obs. histories Agent i’s beliefs over j’s models and its own policy induce a subjective distribution over the future observation sequences Subjective distribution over obs. histories

  12. Subjective Equilibrium in I-POMDPs Absolute Continuity Condition (ACC) • Subjective distribution should not rule out the observation histories considered possible by the true distribution • Cautious beliefs  “Grain of truth” assumption • “Grain of truth” is sufficient but not necessary to satisfy the ACC

  13. ≤ Subjective Equilibrium in I-POMDPs • Proposition 1 (Convergence):Under ACC, an agent’s belief over other’s models updated using the I-POMDP belief update converges with probability 1 • Proof sketch: Show that Bayesian learning in I-POMDPs is a Martingale Apply the Martingale Convergence Theorem (Doob53) • -closeness of distributions:

  14. Subjective Equilibrium in I-POMDPs • Lemma (Blackwell&Dubins62): For all agents, if their initial beliefs satisfy ACC, then after finite time T(), each of their beliefs are -close to the true distribution over the future observation paths • Subjective -Equilibrium (Kalai&Lehrer93): A profile of strategies of agents each of which is an exact best response to a belief that is -close to the true distribution over the observation history • Subjective equilibrium is stable under learning and optimization Prediction

  15. Subjective Equilibrium in I-POMDPs Main Result • Proposition 2: If agents’ beliefs within the I-POMDP framework satisfy the ACC, then after finite time T, their strategies are in subjective -equilibrium, where  is a function of T • When  = 0, subjective equilibrium obtains • Proof follows from the convergence of the I-POMDP belief update and (Blackwell&Dubins62) • ACC is a sufficient condition, but not a necessary one

  16. Difficulty in Practically Satisfying the Conditions

  17. Computational Difficulties in Achieving Equilibrium • There exist computable strategies that admit no computable exact best responses (Nachbar&Zame96) • If possible strategies are assumed computable, then i’s best response may not be computable. Therefore, j’s cautious beliefs  grain of truth • Subtle tension between prediction and optimization • Strictness of ACC

  18. Computational Difficulties in Achieving Equilibrium Proposition 3 (Impossibility): Within the finitely nested I-POMDP framework, all the agents’ beliefs will never simultaneously satisfy the grain of truth assumption • Difficult to realize the equilibrium!

  19. Summary • Absolute Continuity Condition (ACC) • More realistic: “grain of truth” condition • Grain of truth condition is stronger than ACC • Equilibria in I-POMDPs • Theoretical convergence to subjective equilibrium given ACC • Strictness of ACC • Impossible to simultaneously satisfy grain of truth • Computational obstacles to satisfying ACC • Future Work: Investigate the connection between subjective equilibrium and Nash equilibrium

  20. Thank You Questions

  21. Introduction Significance: Real world applications • Robotics • Planetary exploration • Surface mapping by rovers • Coordinate to explore pre-defined region optimally Uncertainty due to sensors • Robot soccer • Coordinate with teammates and deceive opponents Anticipate and track others’ actions Spirit Opportunity RoboCup Competition

  22. Interactive POMDPs Limitations of Nash Equilibrium • Not suitable for general control • Incomplete: Does not say what to do off-equilibria • Non-unique: Multiple solutions, no way to choose “…game theory has been used primarily to analyze environments that are at equilibrium, rather than to control agents within an environment.” - Russell and Norvig AI: A Modern Approach, 2nd Ed.

More Related