140 likes | 288 Views
Building Agents for the Lemonade Game Using a Cognitive Hierarchy Population Model. Michael Wunder Michael Kaisers Michael Littman John Yaros. Overview of Method. In the Lemonade-Stand Game (LG), players are rewarded for finding a partner quickly, to avoid becoming the odd man out
E N D
Building Agents for the Lemonade Game Using a Cognitive Hierarchy Population Model Michael Wunder Michael Kaisers Michael Littman John Yaros
Overview of Method • In the Lemonade-Stand Game (LG), players are rewarded for finding a partner quickly, to avoid becoming the odd man out • As a result, complicated prediction-optimization learners are at a disadvantage • Utilizing heuristics, an agent can identify (and attract) potential partners • Population-based models are useful to determine the best heuristics in the LG
Example: p-beauty contest • Keynes proposed that the stock market is like a beauty contest where judges are trying to guess the contestant (or stock, or strategy) that others like • n players submit a number x between 0 and 100, and the winner is closest to a fraction of the average guess, p*(∑i xi)/n, • p is fraction between 0 and 1, i.e. 2/3
P-beauty game explained • The Nash strategy is to play 0 because it cannot be outplayed • However, first-time players do not reach this outcome…why? from Behavioral Game Theory by Colin Camerer
How a Cognitive Hierarchy Works Level k: reacts to Level k-1 Poor predictable Bart, always picks Rock. … Level 1: reacts to the base strategy at Level 0 only Good ol’ Rock. Homer can’t beat that. Good ol’ Rock. Nothing beats that. Level 0: no reasoning, random action or simple rule
Population-based Reasoning • Steps of the CH technique: • Identify base strategies (random, static) • Derive processes for steps of reasoning • A step of reasoning, in this case, is the strategy that can exploit the one before • Recursively apply steps to each level k • These levels form the “hierarchy” according to some distribution f(k) • Select a strategy that does well against desired population
Lemonade-Stand Game Levels • LG yields elegant level heuristics • L0-U: Uniformly random action • L0-C: Constant action • L0-X: Constant with probability X, otherwise choose randomly • L1: Move Across from most most stable player (with highest X). Also Optimal against L1. This move is Cooperative equilibrium.
Lemonade Game Levels, Cont’d. • L2: Stay Constant for at least one turn, in case opponents are two L1s. If the current location is disadvantageous, move somewhere else, perhaps Across from a good partner. • L3: With other L3, “Sandwich” a constant or L2 player, and become Across from each other if it moves. • Can we classify contestants by level?
Actual Competition Results • Using idealized agents from each of these levels, find the score of each contestant against populations of adjacent levels
Actual Competition Results • The x-axis is composed of a ratio of the nearby levels—Level 1.2 is a population of 80% L1 and 20% Level 2
Actual Competition Results • This population construction method allows for clear distinctions between levels, but other possibilities exist
Conclusion • Our agent (RL3) contains elements of all three levels, which is not optimal against this population of competitors • The model that emerges from LG does predict the outcome fairly well • The model predicts that subsequent repetitions would generally move the population “up” the hierarchy • CH has implications for larger games (e.g. TAC)