Information, Control and Games Lecture 5-2

Information, Control and GamesLecture 5-2 • Game Tree: A More Formal Analysis • Infinite Nash Games • Open-Loop Nash Equilibria • Feedback Nash Equilibria • Closed-Loop Nash Equilibria

Game Tree: A More Formal Analysis Major Ingredients • A root node and a set of intermediate and terminalnodes • A partition of all nodes other than leaf nodes into player sets • A subpartition of the player sets into information sets ~ A DM knows which information set he/she is in, but not the particular node within the information set • Chance player (or nature) has an assigned probability for each chance node • A strategyi for DM i is a mapping from DM i’s information sets to his choices ~ Also called decision rules • Each terminal node has a payoffto maximize (or cost to minimize) Ji for DM i

A Few Terminologies • Perfect information: If each information set is a singleton • Payoff • If there is no chance player, (1, 2, .., N,)  A terminal node • Ji(1, 2, .., N,) is uniquely determined • If there is a chance player, (1, 2, .., N,)  Can lead to multiple terminal nodes with calculable probabilities • Ji(1, 2, .., N,) is the expected payoffs • A game tree can thus be converted to a game in normal form, although there are pros and cons for doing so • Zero sum: If i Ji = 0 for all the terminal nodes • Constant sum: If i Ji = C for all the terminal nodes • Nonzero-sum: If i Ji 0 for some terminal nodes

If the number of strategies for each DM is finite, the we have a finite game • A finite game can be converted into a matrix game in normal form • Backward induction can be extended to general multi-act Nash problems Proposition A finite N-person Nash problem in extensive form with perfect information has a pure Nash equilibrium solution Proof. By induction, working from terminal nodes Theorem. Exist an infinite game with no chance planer and having perfect information but does not have a pure Nash equilibrium solution ~ By counter examples

Infinite Nash Games Mathematical Formulation • System dynamics: xk+1 = fk(xk, u1k, u2k, .., uNk), k  K, K and x0 given unk Unk Rnm (control of DM n at time k) un (un0T, un1T, .., un,K-1T)T (control trajectory of DM n) • Cost function to be minimized: Jn(u) k=0 to K-1 gnk(xk, u1k, u2k, .., uNk) + gnK(xK) ~ Stage-wise additive • Information structures: • Open-loop:nk = {x0} • Memoryless perfect state information (Feedback):nk = {xk} • Closed-loop:nk = {x0, x1, .., xk} ~ Different from feedback Q. How to solve the problem?

General Approaches • Use “person-by-person optimality” • Suppose that (1*, 2*, .., N*) is a Nash solution • Then mini Ji(1*, 2*, .., i, i+1*,., N*) i* • This is an optimal control problem, and can be solved by using e.g., minimal principle or dynamic programming Q. Is the overall problem easy? • The problem is quite involved • How to get the initial guess (10, 20, .., N0)? • How to update to (11, 21, .., N1)? ~ Lack general guidelines • No concept of gradient as in nonlinear programming • We shall present ideas by using variations of an example

u2 u3 u1 x1 x2 Open-Loop Nash Equilibria Example: Three-Person, Single-Act Game • System dynamics: x1 x0 + u3; x2 x1 + u1 + u2 • Open Loop Information ~ 1 = 2 = 3 = {x0} • Cost functions: J1 = x22 + u12; J2 = –x22 + 2u22 – u32 = -J3 SGD. How to find the Nash equilibrium solution? Solution Methodology: Substitute out system dynamics and perform person-by-person optimization x1 x0 + u3; x2 x1 + u1 + u2 = x0 + u1 + u2 + u3 J1 = x22 + u12 = (x0 + u1 + u2 + u3)2 + u12 J2 = –x22 + 2u22 – u32 = –(x0 + u1 + u2 + u3)2 + 2u22 – u32 = -J3 Q. Then what?

Necessary Conditions 0.5 J1/u1 = 0 = (x0 + u1 + u2 + u3) + u1 0.5 J2/u2 = 0 = –(x0 + u1 + u2 + u3) + 2u2 0.5 J3/u3 = 0 = (x0 + u1 + u2 + u3) + u3 • Consequently (u1*, u2*, u3*) = (–2/5 x0, 1/5 x0, –2/5 x0) ~ Open loop solution Resulting States and Costs • Substituting the above into system dynamics and cost funcs: x1 x0 + u3 = 3/5 x0; x2 x1 + u1 + u2 = 2/5 x0 J1OL = x22 + u12 = 8/25 x02 J2OL = –x22 + 2u22 – u32 = –6/25 x02 = - J3OL More General Cases: • Have to intelligently use the Minimum Principle • For details, see Basar and Olsder subsection 6.2.1

Feedback Nash Equilibria • State feedback information without memory: nk = {xk} • Related to feedback games discussed earlier Definition. For a K-stage discrete-time game, (1*, 2*, .., N*) is a feedback Nash equilibrium solution iff (Conditions listed for DM1. Others can be easily extended.) • Stage K-1: J1(11, ., 1(K-2), 1(K-1)*; 21, ., 2(K-2), 2(K-1)*; ...)  J1(11, ., 1(K-2), 1(K-1); 21, ., 2(K-2), 2(K-1)*, ...) • Stage K-2: J1(11, ., 1(K-2)*, 1(K-1)*; 21, ., 2(K-2)*, 2(K-1)*; …)  J1(11, ., 1(K-2), 1(K-1)*; 21, ., 2(K-2)*, 2(K-1)*; ...) • At stage 1: J1(11*, 12*, ., 1(K-1)*; 2*; ..; N*)  J1(11, 12*, ., 1(K-1)*; 2*; ..; N*) • Satisfying Nash at each stage ~ Subgame perfect equilibrium

u2 = 2(x1) u3 = 3(x0) x2 u1 = 1(x1) x1 • To solve the problem, work backward from the terminal stage, fold back, and solve a single-stage problem at each stage Similar to dynamic programming Example. A Three-Person, Single-Act Game (Continued) • System dynamics: x1 x0 + u3; x2 x1 + u1 + u2 ~ As before • Information structure: 3 = {x0}, 1 = 2 = {x1} ~ Feedback • Cost functions: J1 = x22 + u12; J2 = –x22 + 2u22 – u32 = -J3 ~ As before Q. How to solve the problem? Solution Methodology: • Solve the problem for DM1 and DM2 with x1 given • Substitute the above results back for DM3

Second Stage: J1 = x22 + u12 = (x1 + u1 + u2)2 + u12 J2 = –x22 + 2u22 – u32 = –(x1 + u1 + u2)2 + 2u22 – u32 • Necessary Conditions 0.5 J1/u1 = 0 = (x1 + u1 + u2) + u1 0.5 J2/u2 = 0 = –(x1 + u1 + u2) + 2u2 • Consequently (u1*, u2*) = (–2/3 x1, 1/3 x1) ~ Feedback sol. First Stage: x2  x1 + u1 + u2 = 2/3 x1 J3 = (2/3 x1)2 – 2(1/3 x0)2 + u32 = 2/9 (x0 + u3)2 + u32 • Necessary Condition 0.5 J3/u3 = 0 = 2/9 (x0 + u3) + u3  u3 = –2/11 x0

Resulting States and Costs x1 x0 + u3 = 9/11 x0; x2 x1 + u1 + u2 = 2/3 x1 J1FB = x22 + u12 = 72/121 x02 J2FB = –x22 + 2u22 – u32 = –2/11 x02 = -J3FB Compared with Open-Loop Results J1OL = x22 + u12 = 8/25 x02 < J1FB J2OL = –x22 + 2u22 – u32 = –6/25 x02 < J2FB J3OL = 6/25 x02 > J3FB • More information may not help (as for the case with DM1 and DM2)

General Results Proposition 6.1. Under the memoryless perfect state information pattern, every feedback Nash equilibrium solution is a Nash equilibrium solution Theorem 6.6. The set of strategies {nk*, n  [1, N], k  [0, K]} is a feedback Nash equilibrium solution (or a subgame perfect equilibrium solution) iff  “optimal cost-to-go functions” {Vnk(xk)} s.t. V1k(xk) = min u1k{g1k(xk, u1k, u2k*, .., uNk*) + V1(k+1)(f(xk, u1k, u2k*, .., uNk*))} • The problem is to be solved by backward induction similar to dynamic programming

u2 = 2(x0, x1) u3 = 3(x0) x2 u1 = 1(x0, x1) x1 Closed-Loop Nash Equilibria Example. Same as before, except with 3 = {x0}, and 1 = 2 = {x0, x1} Q. Major differences between this vs. the one with feedback information? • Second Stage: Identical to the feedback case 0.5 J1/u1 = 0 = (x1 + u1 + u2) + u1 0.5 J2/u2 = 0 = –(x1 + u1 + u2) + 2u2 • Consequently (u1*, u2*) = (–2/3 x1, 1/3 x1) • First Stage: Also identical to the feedback case 0.5 J3/u3 = 0 = 2/9 (x0 + u3) + u3 u3 = –2/11 x0 The previous feedback solution is a closed-loop solution Q. Does there exist other solutions? Good or bad for DMs?

Other Possibilities • Letx1(x0)  x0 + 3*(x0) ~ A function of x0 only • Design the following strategies: u1 = –2/3 x1 + p[x1 –x1(x0)], and u2 = 1/3 x1 + q[x1 –x1(x0)] • Legitimate strategies ~ 1 = 2 = {x0, x1} • After substituting the above strategies into x2, x1, and then into J3, the resulting u3 generally depends on p and q x2 x1 + u1 + u2 = 2/3 x1 + p[x1 –x1(x0)] + q[x1 –x1(x0)] J3 = x22 – 2u22 + u32 = {2/3 (x0 + u3) + p[u3 – 3*(x0)] + q[u3 – 3*(x0)]}2 – 2{1/3 (x0 + u3) + q[u3 – 3*(x0)]}2 + u32 • DM1 and DM2 can thus select p and q to their own benefit

Necessary Conditions for the First Stage with p and q 0.5 J3/u3 = 0 = {2/3 (x0 + u3) + p[u3 – 3*(x0)] + q[u3 – 3*(x0)]}[2/3 + p + q] – 2{1/3 (x0 + u3) + q[u3 – 3*(x0)]}[1/3 + q] + u3 = (11/9 + 2/3 p)u3 + (2/9 + 2/3 p)x0 = 0  u3 = - (2 + 6p)x0/(11 + 6p) • A function of p (generally depending on p and q) ~ Infinite number of solutions for various combinations of p and q • If p = 0, then u3 = -2/11 x0 as before • Second Order Condition • ICBS that 0.5 2J3/u32 = (2/3 + p)2 + 2pq – q2 + 7/9 • It is positive for properly selected p and q

Resulting States and Controls • Substituting the above into system dynamics and control: x1 x0 + u3 = 9x0/(11 + 6p) =x1(x0) u1 = –2/3 x1 + p[x1 – 9x0/(11 + 6p)] ~ 1*(x0, x1) (= –2/3 x1 = -6x0/(11 + 6p) ~ Realized value) u2 = 1/3 x1 + q[x1 – 9x0/(11 + 6p)] ~ 2*(x0, x1) (= 1/3 x1 = 3x0/(11 + 6p) ~ Realized value) x2 x1 + u1 + u2 = 2/3 x1 = 6x0/(11 + 6p) • “Information non-uniqueness” (having both x0 and x1) created this “solution non-uniqueness”! • The feedback game (having both x1 only) is “informationally inferior” to the closed-loop game Q. How to select p and q in general? Have to look into costs

Resulting Costs • Substituting the above into cost functions: J1CL = x22 + u12 = 72x02/(11 + 6p)2 ~ DM1 wants to select a large p to minimize J1CL. ~ A problem at a higher level, s.t. the second order condition J2CL = –J3CL = -(22 + 24p + 36p2)x02/(11 + 6p)2 ~ Unfortunately, q doesn’t play any role • Compared with Feedback Results: J1FB = x22 + u12 = 72/121 x02 J2FB = –x22 + 2u22 – u32 = –2/11 x02 • Whether JiCL is larger/smaller than JiFB depends on p

Some General Results Definition 6.3 • Let G1 and G2 be two Nash games having the same extensive form description except the information structure • If nk1nk2 n and k (whatever DM1 knows at each stage of G1 he also knows at the corresponding stages of G2, but not vice versa, e.g., G1 feedback and G2 closed-loop), • then G1 is informationally inferior to G2 Proposition 6.6 • If G1 is informationally inferior to G2, then • Any Nash equilibrium for G1 is also a Nash solution for G2 • Any Nash equilibrium for G2, if it is legally feasible for G1, then it is also a Nash solution for G1

Summary • Game Tree: A More Formal Analysis • Infinite Nash Games • Open-Loop Nash Equilibria • Feedback Nash Equilibria • Closed-Loop Nash Equilibria Next Time • Introduction to Experimental Games

Information, Control and Games Lecture 5-2

Information, Control and Games Lecture 5-2

Presentation Transcript

Lecture 2 5

Lecture 5: Topology Control

Lecture 2: Access Control

Information, Control and Games

Information systems Lecture- 5

Information Architecture and Games

Lecture 5: A DataPath and Control Unit

Lecture Notes II- 2 Dynamic Games of Complete Information

Lecture 5: Decision and Control

Lecture 5: Congestion Control

Lecture 5: Topology Control

Lecture 5: Control Structure - Repetition

Lecture 2 Process Description and Control

Lecture 5 Dynamic Games of Incomplete Information

Lecture 2 Control Structures: Part 2

Lecture 5: Topology Control

Information, Control and Games

Lecture 5: Layers of Control

Information, Control and Games

Information, Control and Games