1 / 23

Multi-Agent Learning Mini-Tutorial

Multi-Agent Learning Mini-Tutorial. Gerry Tesauro IBM T.J.Watson Research Center http://www.research.ibm.com/infoecon http://www.research.ibm.com/massdist. Outline. Statement of the problem Tools and concepts from RL & game theory “Naïve” approaches to multi-agent learning

jam
Download Presentation

Multi-Agent Learning Mini-Tutorial

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center http://www.research.ibm.com/infoecon http://www.research.ibm.com/massdist

  2. Outline • Statement of the problem • Tools and concepts from RL & game theory • “Naïve” approaches to multi-agent learning • ordinary single-agent RL; no-regret learning • fictitious play • evolutionary game theory • “Sophisticated” approaches • minimax-Q (Littman), Nash-Q (Hu & Wellman) • tinkering with learning rates: WoLF (Bowling), Multiple-timescale Q-learning (Leslie & Collins) • “strategic teaching” (Camerer talk) • Challenges and Opportunities

  3. Normal single-agent learning • Assume that environment has observable states, characterizable expected rewards and state transitions, and all of the above is stationary (MDP-ish) • Non-learning, theoretical solution to fully specified problem: DP formalism • Learning: solve by trial and error without a full specification: RL + exploration, Monte Carlo, ...

  4. Multi-Agent Learning Problem: • Agent tries to solve its learning problem, while other agents in the environment also are trying to solve their own learning problems. • Non-learning, theoretical solution to fully specified problem: game theory

  5. Basics of game theory • A game is specified by: players (1…N), actions, and payoff matrices (functions of joint actions) B’s action A’s action A’s payoff B’s payoff • If payoff matrices are identical, game is cooperative, else non-cooperative (zero-sum = purely competitive)

  6. Basic lingo…(2) • Games with no states: (bi)-matrix games • Games with states: stochastic games, Markov games; (state transitions are functions of joint actions) • Games with simultaneous moves: normal form • Games with alternating turns: extensive form • No. of rounds = 1: one-shot game • No. of rounds > 1: repeated game • deterministic action choice: pure strategy • non-deterministic action choice: mixed strategy

  7. Basic Analysis • A joint strategy x is Pareto-optimal if no x’ that improves everybody’s payoffs • An agent’s xi is a dominant strategy if it’s always best regardless of others’ actions • xi is a best-reponse to others’ x-i if it maximizes payoff given x-i • A joint strategy x is an equilibrium if each agent’s strategy is simultaneously a best-response to everyone else’s strategy, i.e. no incentive to deviate (Nash, correlated) • A Nash equilibrium always exists, but may be exponentially many of them, and not easy to compute

  8. What about imperfect information games? • Nash eqm. requires knowledge of all payoffs. For imperfect info. games, corresponding concept is Bayes-Nash equilibrium (Nash plus Bayesian inference over hidden information). Even more intractable than regular Nash.

  9. Can we make game theory more tractable? • Active area of research • Symmetric games: payoffs are invariant under swapping of player labels.  Can look for symmetric equilibria, where all agents play same mixed strategy. • Network games: agent payoffs only depend on interactions with a small # of neighbors • Summarization games: payoffs are simple summarization functions of population joint actions (e.g. voting)

  10. Summary: pros and cons of game theory • Game theory provides a nice conceptual/theoretical framework for thinking about multi-agent learning. • Game theory is appropriate provided that: • Game is stationary and fully specified; • Enough computer power to compute equilibrium; • Can assume other agents are also game theorists; • Can solve equilibrium coordination problem. • Above conditions rarely hold in real applications • Multi-agent learning is not only a fascinating problem, it may be the only viable option.

  11. Naïve Approaches to Multi-Agent Learning • Basic idea: agent adapts, ignoring non-stationarity of other agents’ strategies • 1. Fictitious play: Agent observes time-average frequency of other players’ action choices, and models: agent then plays best-response to this model • Variants of fictitious play: exponential recency weighting, “smoothed” best response (~softmax), small adjustment toward best response, ...

  12. What if all agents use fictitious play? • Strict Nash equilibria are absorbing points for fictitious play • Typical result is limit-cycle behavior of strategies, with increasing period as N   • In certain cases, product of empirical distributions converges to Nash even though actual play cycles (penny matching example)

  13. More Naïve Approaches… • 2. Evolutionary game theory:“Replicator Dynamics” models: large population of agents using different strategies, fittest agents breed more copies. • Let x= population strategy vector, and xk = fraction of population playing strategy k. Growth rate then: • Above equation also derived from an “imitation” model • NE are fixed points of above equation, but not necessarily attractors (unstable or neutral stable)

  14. Many possible dynamic behaviors... • limit cycles attractors unstable f.p. • Also saddle points, chaotic orbits, ...

  15. Replicator dynamics: auction bidding strategies

  16. More Naïve Approaches… • 3. Iterated Gradient Ascent: (Singh, Kearns and Mansour): Again does a myopic adaptation to other players’ current strategy. • Coupled system of linear equations: u is linear in xi and x-i • Analysis for two-player, two-action games: either converges to a Nash fixed point on the boundary (at least one pure strategy), or get limit cycles

  17. Further Naïve Approaches… • 4. Dumb Single-Agent Learning: Use a single-agent algorithm in a multi-agent problem & hope that it works • No-regret learning by pricebots (Greenwald & Kephart) • Simultaneous Q-learning by pricebots (Tesauro & Kephart) • In many cases, this actually works: learners converge either exactly or approximately to self-consistent optimal strategies

  18. “Sophisticated” approaches • Takes into account the possibility that other agents’ strategies might change. • 5. Multi-Agent Q-learning: • Minimax-Q (Littman): convergent algorithm for two-player zero-sum stochastic games • Nash-Q (Hu & Wellman): convergent algorithm for two-player general-sum stochastic games; requires use of Nash equilibrium solver

  19. More sophisticated approaches... • 6. Varying learning rates • WoLF: “Win or Learn Fast” (Bowling): agent reduces its learning rate when performing well, and increases when doing badly. Improves convergence of IGA and policy hill-climbing • Multi-timescale Q-Learning (Leslie): different agents use different power laws t-n for learning rate decay: achieves simultaneous convergence where ordinary Q-learning doesn’t

  20. More sophisticated approaches... • 7. “Strategic Teaching:” recognizes that other players’ strategy are adaptive • “A strategic teacher may play a strategy which is not myopically optimal (such as cooperating in Prisoner’s Dilemma) in the hope that it induces adaptive players to expect that strategy in the future, which triggers a best-response that benefits the teacher.” (Camerer, Ho and Chong)

  21. Theoretical Research Challenges • Proper theoretical formulation? • “No short-cut” hypothesis: Massive on-line search a la Deep Blue to maximize expected long-term reward • (Bayesian) Model and predict behavior of other players, including how they learn based on my actions (beware of infinite model recursion) • trial-and-error exploration • continual Bayesian inference using all evidence over all uncertainties (Boutilier: Bayesian exploration) • When can you get away with simpler methods?

  22. Real-World Opportunities • Multi-agent systems where you can’t do game theory (covers everything :-)) • Electronic marketplaces (Kephart) • Mobile networks (Chang) • Self-managing computer systems (Kephart) • Teams of robots (Bowling, Stone) • Video games • Military/counter-terrorism applications

More Related