190 likes | 219 Views
Bayesian and non-Bayesian Learning in Games. Ehud Lehrer. Tel Aviv University, School of Mathematical Sciences. Including joint works with: Ehud Kalai, Rann Smorodinsky, Eioln Solan. Learning in Games. Informal definition of learning : a decentralized process that
E N D
Bayesian and non-Bayesian Learning in Games Ehud Lehrer Tel Aviv University, School of Mathematical Sciences Including joint works with: Ehud Kalai, Rann Smorodinsky, Eioln Solan.
Learning in Games Informal definition of learning: a decentralized process that converges (in some sense) to (some) equilibrium. • Bayesian (rational) learning: Players do not start in equilibrium, but • they have some initial belief about other players’ strategies • they are rational: they maximize their payoffs • they take into account future payoffs • Convergence in REPEATED GAME • Non-Bayesian learning: Players • don’t have any initial belief about other players’ strategies • don’t maximize their payoffs • don’t take into account future payoffs • Convergence (of the empirical frequency) to an equilibrium of the ONE-SHOT GAME
Bayesian vs. non-Bayesian Bayesian learning: Players do not start in equilibrium, but they start with a “grain” of idea about what other players do. Nature of results: players eventually play something close to an equilibrium of the repeated game. Non-Bayesian learning: Players have no idea about other players’ actions. They don’t care to maximize payoffs. Nature of results: the statistics of past actions looks like an equilibrium of the one-shot game.
Important tools • Bayesian learning: merging of two probability measures along a a filtration (an increasing sequence of - fields) Non-Bayesian learning: approachability Both were initiated by Blackwell (the first with Dubins)
A set F is excludable by player 2 if there is a strategy s.t. Repeated Games with Vector Payoffs • I = finite set of actions of player 1. • J = finite set of actions of player 2. • M = (mi,j) = a payoff matrix. Entries are vectors in Rd. A set F is approachable by player 1 if there is a strategy s.t. There are sets which are neither approachable nor excludable.
Approachability • Applications (a sample): • No-regret (Hannan) • Repeated games with incomplete information (Aumann-Maschler) • Learning (Foster-Vohra, Hart-Mas Colell) • Manipulation of calibration tests (Foster-Vohra, Lehrer, • Smorodinsky-Sandroni-Vohra) • Generating generalized normal-number (Lehrer)
Characterization of Approachable Sets H(p0) x y the line xy F the hyperplane perpendicular to xy that passes through y mp,q = i,j pi mi,j qj H(p) = { mp,q , q (I) } • A closed set F Rd is a B-set if for every xFthere is y F that satisfies: • y is a closest point in F to x. • The hyperplane perpendicular to the line xy that passes through y separates between x and H(p), for some p (I).
Characterization of Approachable Sets Theorem [Blackwell, 1956]: every B-set F is approachable. The approaching strategy plays at each stage n the mixed action p such that H(p) and x are separated by the hyperplane connecting xand a closest point to x in F. With this strategy: Theorem [Blackwell, 1956]: every convex set is either approachable or excludable. Theorem [Hou, 1971; Spinat, 2002]: every minimal (w.r.t. set inclusion) approachable set is a B-set. Or: A set is approachable if and only if it contains a B-set.
Bounded Computational Capacity A strategy is k-bounded-recall if it depends only on the last k pairs of actions (and it does not depend on previously played actions). • A (non-deterministic) automaton is given by: • A finite state space. • A probability distribution over states, according to which the initial state is chosen. • A set of inputs (say, the set I× J of action pairs). • A set of outputs (say, I , the set of player 1’s actions). • A rule that assigns to each state a probability distribution over outputs. • A transition rule that assigns to every state and every input a probability distribution over the next state.
Approachability and Bounded Capacity A set F is approachable with bounded-recall strategies by player 1 if for every >0, the set B(F, ) := { y : d(y, F) } is approachable by some bounded-recall strategy. A set F is excludable against bounded-recall strategies by player 2 if player 2 has a strategy such that • Theorem (w/ Eilon Solan): The following statements are equivalent. • The set F is approachable with bounded-recall strategies. • The set F is approachable with automata. • The set Fcontains a convex approachable set. • The set F is not excludable against bounded-recall strategies. 4 points to note
Main Theorem • Theorem: The following statements are equivalent for closed sets. • The set F is approachable with bounded-recall strategies. • The set F is approachable with automata. • The set F contains a convex approachable set. • The set F is not excludable against bounded-recall strategies. • A set is approachable with automata if and only if it is approachable by bounded-recall strategies. 2. A complete characterization of sets that are approachable with bounded-recall strategies. 3. A set which is not approachable with bounded-recall strategies, is excludable against all bounded-recall strategies. 4. We do not know whether the same holds for automata.
Example On board Good news: in applications target sets are convex ( a point or a whole -- positive or negative -- orthant).
Approachability in Hilbert space • I = finite set of actions of player 1. • J = finite set of actions of player 2. • M = (mi,j) = a payoff matrix. Entries are points in HS • (random variables). • All may change with the stage n. A set F is approachable by player 1 if there is a strategy s.t. Advantage: allows for infinitely many constraints Theorem: Suppose that at stage n, the average payoff is and y is a closest point in F to . If the hyperplane perpendicular to the line that passes through y separates between and H(p), for some p (I), then F is approachable.
Approachability and law of large numbers are uncorrelated r.v.’s with . is the dot product. F is At any stage n, .
F The game: each players has only one action. The payoff at stage n is . Thus, F is approachable. This is the strong law of large numbers. (When the payoffs are not uniformly bounded, there is an additional boundedness condition.)
Activeness function H is (even over a finite probability space). At stage n the characteristic function indicates which coordinates are active and which are not. The average payoff at stage n is Applications: 1. repeated games with incomplete information – different games are active on different times 2. construction of normal numbers 3. manipulability of many calibration tests 4. general no-regret theorem (against many replacing schemes) 5. convergence to correlated eq. along many sequences
Activeness function – cont. Theorem: suppose that F is convex. Let be the closest point in F to the average payoff at time n, . If the hyperplane perpendicular to the line that passes through separates between and H(p), for some p (I), then F is approachable.