1 / 20

Information, Control and Games Lecture 5-2

Information, Control and Games Lecture 5-2. Game Tree: A More Formal Analysis Infinite Nash Games Open-Loop Nash Equilibria Feedback Nash Equilibria Closed-Loop Nash Equilibria. Game Tree: A More Formal Analysis. Major Ingredients

pwalter
Download Presentation

Information, Control and Games Lecture 5-2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information, Control and GamesLecture 5-2 • Game Tree: A More Formal Analysis • Infinite Nash Games • Open-Loop Nash Equilibria • Feedback Nash Equilibria • Closed-Loop Nash Equilibria

  2. Game Tree: A More Formal Analysis Major Ingredients • A root node and a set of intermediate and terminalnodes • A partition of all nodes other than leaf nodes into player sets • A subpartition of the player sets into information sets ~ A DM knows which information set he/she is in, but not the particular node within the information set • Chance player (or nature) has an assigned probability for each chance node • A strategyi for DM i is a mapping from DM i’s information sets to his choices ~ Also called decision rules • Each terminal node has a payoffto maximize (or cost to minimize) Ji for DM i

  3. A Few Terminologies • Perfect information: If each information set is a singleton • Payoff • If there is no chance player, (1, 2, .., N,)  A terminal node • Ji(1, 2, .., N,) is uniquely determined • If there is a chance player, (1, 2, .., N,)  Can lead to multiple terminal nodes with calculable probabilities • Ji(1, 2, .., N,) is the expected payoffs • A game tree can thus be converted to a game in normal form, although there are pros and cons for doing so • Zero sum: If i Ji = 0 for all the terminal nodes • Constant sum: If i Ji = C for all the terminal nodes • Nonzero-sum: If i Ji 0 for some terminal nodes

  4. If the number of strategies for each DM is finite, the we have a finite game • A finite game can be converted into a matrix game in normal form • Backward induction can be extended to general multi-act Nash problems Proposition A finite N-person Nash problem in extensive form with perfect information has a pure Nash equilibrium solution Proof. By induction, working from terminal nodes Theorem. Exist an infinite game with no chance planer and having perfect information but does not have a pure Nash equilibrium solution ~ By counter examples

  5. Infinite Nash Games Mathematical Formulation • System dynamics: xk+1 = fk(xk, u1k, u2k, .., uNk), k  K, K and x0 given unk Unk Rnm (control of DM n at time k) un (un0T, un1T, .., un,K-1T)T (control trajectory of DM n) • Cost function to be minimized: Jn(u) k=0 to K-1 gnk(xk, u1k, u2k, .., uNk) + gnK(xK) ~ Stage-wise additive • Information structures: • Open-loop:nk = {x0} • Memoryless perfect state information (Feedback):nk = {xk} • Closed-loop:nk = {x0, x1, .., xk} ~ Different from feedback Q. How to solve the problem?

  6. General Approaches • Use “person-by-person optimality” • Suppose that (1*, 2*, .., N*) is a Nash solution • Then mini Ji(1*, 2*, .., i, i+1*,., N*) i* • This is an optimal control problem, and can be solved by using e.g., minimal principle or dynamic programming Q. Is the overall problem easy? • The problem is quite involved • How to get the initial guess (10, 20, .., N0)? • How to update to (11, 21, .., N1)? ~ Lack general guidelines • No concept of gradient as in nonlinear programming • We shall present ideas by using variations of an example

  7. u2 u3 u1 x1 x2 Open-Loop Nash Equilibria Example: Three-Person, Single-Act Game • System dynamics: x1 x0 + u3; x2 x1 + u1 + u2 • Open Loop Information ~ 1 = 2 = 3 = {x0} • Cost functions: J1 = x22 + u12; J2 = –x22 + 2u22 – u32 = -J3 SGD. How to find the Nash equilibrium solution? Solution Methodology: Substitute out system dynamics and perform person-by-person optimization x1 x0 + u3; x2 x1 + u1 + u2 = x0 + u1 + u2 + u3 J1 = x22 + u12 = (x0 + u1 + u2 + u3)2 + u12 J2 = –x22 + 2u22 – u32 = –(x0 + u1 + u2 + u3)2 + 2u22 – u32 = -J3 Q. Then what?

  8. Necessary Conditions 0.5 J1/u1 = 0 = (x0 + u1 + u2 + u3) + u1 0.5 J2/u2 = 0 = –(x0 + u1 + u2 + u3) + 2u2 0.5 J3/u3 = 0 = (x0 + u1 + u2 + u3) + u3 • Consequently (u1*, u2*, u3*) = (–2/5 x0, 1/5 x0, –2/5 x0) ~ Open loop solution Resulting States and Costs • Substituting the above into system dynamics and cost funcs: x1 x0 + u3 = 3/5 x0; x2 x1 + u1 + u2 = 2/5 x0 J1OL = x22 + u12 = 8/25 x02 J2OL = –x22 + 2u22 – u32 = –6/25 x02 = - J3OL More General Cases: • Have to intelligently use the Minimum Principle • For details, see Basar and Olsder subsection 6.2.1

  9. Feedback Nash Equilibria • State feedback information without memory: nk = {xk} • Related to feedback games discussed earlier Definition. For a K-stage discrete-time game, (1*, 2*, .., N*) is a feedback Nash equilibrium solution iff (Conditions listed for DM1. Others can be easily extended.) • Stage K-1: J1(11, ., 1(K-2), 1(K-1)*; 21, ., 2(K-2), 2(K-1)*; ...)  J1(11, ., 1(K-2), 1(K-1); 21, ., 2(K-2), 2(K-1)*, ...) • Stage K-2: J1(11, ., 1(K-2)*, 1(K-1)*; 21, ., 2(K-2)*, 2(K-1)*; …)  J1(11, ., 1(K-2), 1(K-1)*; 21, ., 2(K-2)*, 2(K-1)*; ...) • At stage 1: J1(11*, 12*, ., 1(K-1)*; 2*; ..; N*)  J1(11, 12*, ., 1(K-1)*; 2*; ..; N*) • Satisfying Nash at each stage ~ Subgame perfect equilibrium

  10. u2 = 2(x1) u3 = 3(x0) x2 u1 = 1(x1) x1 • To solve the problem, work backward from the terminal stage, fold back, and solve a single-stage problem at each stage Similar to dynamic programming Example. A Three-Person, Single-Act Game (Continued) • System dynamics: x1 x0 + u3; x2 x1 + u1 + u2 ~ As before • Information structure: 3 = {x0}, 1 = 2 = {x1} ~ Feedback • Cost functions: J1 = x22 + u12; J2 = –x22 + 2u22 – u32 = -J3 ~ As before Q. How to solve the problem? Solution Methodology: • Solve the problem for DM1 and DM2 with x1 given • Substitute the above results back for DM3

  11. Second Stage: J1 = x22 + u12 = (x1 + u1 + u2)2 + u12 J2 = –x22 + 2u22 – u32 = –(x1 + u1 + u2)2 + 2u22 – u32 • Necessary Conditions 0.5 J1/u1 = 0 = (x1 + u1 + u2) + u1 0.5 J2/u2 = 0 = –(x1 + u1 + u2) + 2u2 • Consequently (u1*, u2*) = (–2/3 x1, 1/3 x1) ~ Feedback sol. First Stage: x2  x1 + u1 + u2 = 2/3 x1 J3 = (2/3 x1)2 – 2(1/3 x0)2 + u32 = 2/9 (x0 + u3)2 + u32 • Necessary Condition 0.5 J3/u3 = 0 = 2/9 (x0 + u3) + u3  u3 = –2/11 x0

  12. Resulting States and Costs x1 x0 + u3 = 9/11 x0; x2 x1 + u1 + u2 = 2/3 x1 J1FB = x22 + u12 = 72/121 x02 J2FB = –x22 + 2u22 – u32 = –2/11 x02 = -J3FB Compared with Open-Loop Results J1OL = x22 + u12 = 8/25 x02 < J1FB J2OL = –x22 + 2u22 – u32 = –6/25 x02 < J2FB J3OL = 6/25 x02 > J3FB • More information may not help (as for the case with DM1 and DM2)

  13. General Results Proposition 6.1. Under the memoryless perfect state information pattern, every feedback Nash equilibrium solution is a Nash equilibrium solution Theorem 6.6. The set of strategies {nk*, n  [1, N], k  [0, K]} is a feedback Nash equilibrium solution (or a subgame perfect equilibrium solution) iff  “optimal cost-to-go functions” {Vnk(xk)} s.t. V1k(xk) = min u1k{g1k(xk, u1k, u2k*, .., uNk*) + V1(k+1)(f(xk, u1k, u2k*, .., uNk*))} • The problem is to be solved by backward induction similar to dynamic programming

  14. u2 = 2(x0, x1) u3 = 3(x0) x2 u1 = 1(x0, x1) x1 Closed-Loop Nash Equilibria Example. Same as before, except with 3 = {x0}, and 1 = 2 = {x0, x1} Q. Major differences between this vs. the one with feedback information? • Second Stage: Identical to the feedback case 0.5 J1/u1 = 0 = (x1 + u1 + u2) + u1 0.5 J2/u2 = 0 = –(x1 + u1 + u2) + 2u2 • Consequently (u1*, u2*) = (–2/3 x1, 1/3 x1) • First Stage: Also identical to the feedback case 0.5 J3/u3 = 0 = 2/9 (x0 + u3) + u3 u3 = –2/11 x0 The previous feedback solution is a closed-loop solution Q. Does there exist other solutions? Good or bad for DMs?

  15. Other Possibilities • Letx1(x0)  x0 + 3*(x0) ~ A function of x0 only • Design the following strategies: u1 = –2/3 x1 + p[x1 –x1(x0)], and u2 = 1/3 x1 + q[x1 –x1(x0)] • Legitimate strategies ~ 1 = 2 = {x0, x1} • After substituting the above strategies into x2, x1, and then into J3, the resulting u3 generally depends on p and q x2 x1 + u1 + u2 = 2/3 x1 + p[x1 –x1(x0)] + q[x1 –x1(x0)] J3 = x22 – 2u22 + u32 = {2/3 (x0 + u3) + p[u3 – 3*(x0)] + q[u3 – 3*(x0)]}2 – 2{1/3 (x0 + u3) + q[u3 – 3*(x0)]}2 + u32 • DM1 and DM2 can thus select p and q to their own benefit

  16. Necessary Conditions for the First Stage with p and q 0.5 J3/u3 = 0 = {2/3 (x0 + u3) + p[u3 – 3*(x0)] + q[u3 – 3*(x0)]}[2/3 + p + q] – 2{1/3 (x0 + u3) + q[u3 – 3*(x0)]}[1/3 + q] + u3 = (11/9 + 2/3 p)u3 + (2/9 + 2/3 p)x0 = 0  u3 = - (2 + 6p)x0/(11 + 6p) • A function of p (generally depending on p and q) ~ Infinite number of solutions for various combinations of p and q • If p = 0, then u3 = -2/11 x0 as before • Second Order Condition • ICBS that 0.5 2J3/u32 = (2/3 + p)2 + 2pq – q2 + 7/9 • It is positive for properly selected p and q

  17. Resulting States and Controls • Substituting the above into system dynamics and control: x1 x0 + u3 = 9x0/(11 + 6p) =x1(x0) u1 = –2/3 x1 + p[x1 – 9x0/(11 + 6p)] ~ 1*(x0, x1) (= –2/3 x1 = -6x0/(11 + 6p) ~ Realized value) u2 = 1/3 x1 + q[x1 – 9x0/(11 + 6p)] ~ 2*(x0, x1) (= 1/3 x1 = 3x0/(11 + 6p) ~ Realized value) x2 x1 + u1 + u2 = 2/3 x1 = 6x0/(11 + 6p) • “Information non-uniqueness” (having both x0 and x1) created this “solution non-uniqueness”! • The feedback game (having both x1 only) is “informationally inferior” to the closed-loop game Q. How to select p and q in general? Have to look into costs

  18. Resulting Costs • Substituting the above into cost functions: J1CL = x22 + u12 = 72x02/(11 + 6p)2 ~ DM1 wants to select a large p to minimize J1CL. ~ A problem at a higher level, s.t. the second order condition J2CL = –J3CL = -(22 + 24p + 36p2)x02/(11 + 6p)2 ~ Unfortunately, q doesn’t play any role • Compared with Feedback Results: J1FB = x22 + u12 = 72/121 x02 J2FB = –x22 + 2u22 – u32 = –2/11 x02 • Whether JiCL is larger/smaller than JiFB depends on p

  19. Some General Results Definition 6.3 • Let G1 and G2 be two Nash games having the same extensive form description except the information structure • If nk1nk2 n and k (whatever DM1 knows at each stage of G1 he also knows at the corresponding stages of G2, but not vice versa, e.g., G1 feedback and G2 closed-loop), • then G1 is informationally inferior to G2 Proposition 6.6 • If G1 is informationally inferior to G2, then • Any Nash equilibrium for G1 is also a Nash solution for G2 • Any Nash equilibrium for G2, if it is legally feasible for G1, then it is also a Nash solution for G1

  20. Summary • Game Tree: A More Formal Analysis • Infinite Nash Games • Open-Loop Nash Equilibria • Feedback Nash Equilibria • Closed-Loop Nash Equilibria Next Time • Introduction to Experimental Games

More Related