1 / 20

Regret Minimization in Stochastic Games

Regret Minimization in Stochastic Games. Shie Mannor and Nahum Shimkin Technion, Israel Institute of Technology Dept. of Electrical Engineering. Introduction. Modeling of a dynamic decision process as a stochastic game: Non stationarity of the environment

ajaxe
Download Presentation

Regret Minimization in Stochastic Games

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regret Minimization in Stochastic Games Shie Mannor and Nahum Shimkin Technion, Israel Institute of Technology Dept. of Electrical Engineering UAI 2000

  2. Introduction • Modeling of a dynamic decision process as a stochastic game: • Non stationarity of the environment • Environments are not (necessarily) hostile • Looking for the best possible strategy in light of the environment’s actions. UAI 2000

  3. Repeated Matrix Games • The sets of single stage strategies P and Q are simplical. • Rewards are defined by a reward matrix G: r(p,q)=pGq • Reward criteria - average reward Need not converge –stationarity is not assumed UAI 2000

  4. Regret for Repeated Matrix Games • Suppose by time t, average reward is , opponent empirical strategy is qt. • The regret is defined as: • A policy is called regret minimizing if: UAI 2000

  5. Regret minimization for repeated matrix games • Such policies do exist (Hannan, 56) • A proof using Approachability theory (Blackwell, 56) • Also for games with partial observation (Auer et al. ,1995 ; Rustichini, 1999) UAI 2000

  6. Stochastic Games • Formal Model: S={1,…,s} state space A=A(s) actions of Regret minimizing player, P1 B=B(s) actions of the “environment”, P2 r - reward function, r(s,a,b) P - transition kernel, P(s`|s,a,b) • Expected average for pP, qQ is r(p,q) • Single state recurrence assumption UAI 2000

  7. Bayes Reward in Strategy Space • For every stationary strategy qQ, the Bayes reward is defined as: • Problems: • P2’s strategy is not completely observed • P1’s observations may depends on the strategies of both players UAI 2000

  8. Bayes Reward in State-Action Space • Let psb be the observed frequency of P2’s action b and state s. • A natural estimate of q is: The associated Bayes envelope is: UAI 2000

  9. Approachability Theory • A standard tool in the theory of repeated matrix games (Blackwell, 1956) • For a game with vector reward and average reward • A set is approachable by P1 with a policy s if: • Was extended to recurrent stochastic games (Shimkin and Shwartz, 1993) UAI 2000

  10. The Convex Bayes Envelope • In general BE is not approachable. • Define CBE=co(BE), that is where is the lower convex hull of Theorem: CBE is approachable. (val is the value of the game) UAI 2000

  11. Single Controller Games Theorem: Assume that P2 alone controls the transitions, i.e. then BE itself is approachable. UAI 2000

  12. An Application to Prediction with Expert Advice • Given a channel and a set of experts • At each time epoch each expert states his prediction of the next symbol and P1 has to choose his prediction,  • Then a letter  appears in the channel and P1 receives his prediction reward r(, ) • Problem can be formulated as stochastic game, P2 stands for all experts and the channel UAI 2000

  13. (0,0,0) r(a,b) r=0 0 0 (k-1,k,k) (k,k,k) Expert recommendation Prediction Example (cont’) Theorem: P1 has a zero regret strategy. UAI 2000

  14. a=1 P=0.99 P=0.99 P=0.99 r=b S1 r=b S0 a=0 B(0)=B(1)={-1,1} P=0.99 An example in which BE is not approachable It can be proved that BE for the above game is not approachable UAI 2000

  15. Example (cont’) • In r*(q) space the envelopes are: UAI 2000

  16. Open questions • Characterization of minimal approachable sets in reward-state-actions space • On-line learning schemes for stochastic games with unknown parameters • Other ways of formulating optimality with respect to observed state action frequencies UAI 2000

  17. Conclusions • The problem of regret minimization for stochastic games was considered • The proposed solution concept, CBE, is based on convexification of the Bayes envelope in the natural state action space. • The concept of CBE ensures an average reward that is higher than value when the opponent is sub optimal UAI 2000

  18. Regret Minimization in Stochastic Games Shie Mannor and Nahum Shimkin Technion, Israel Institute of Technology Dept. of Electrical Engineering UAI 2000

  19. Approachability Theory • Let m(p,q) be the average vector valued reward in a game when P1 and P2 play p and q • Define • Theorem [Blackwell 56]: A convex set C is approachable if and only if for every qQ • Extended to stochastic games (Shimkin and Shwartz, 1993) UAI 2000

  20. A related Vector Valued Game • Define the following vector valued game: • If in state s action b is played by P2 and a reward r is gained then the vector valued mt : UAI 2000

More Related