170 likes | 319 Views
From bounded rationality to learning. Bernard WALLISER (Paris School of Economics) Rationality, Heuristics and Motivation in Decision Making, Pisa, November 12-14, 2010. Introduction (1). ¤ Simon’s problem decision considered as a reasoning process
E N D
From bounded rationalityto learning Bernard WALLISER (Paris School of Economics) Rationality, Heuristics and Motivation in Decision Making, Pisa, November 12-14, 2010
Introduction (1) ¤ Simon’s problem decision considered as a reasoning process + limited capacities of information gathering and treatment → bounded rationality procedures (satisficing, ….) But how are these procedures precisely related to the assumptions? ¤ Meta-optimization paradox if the decision-maker optimizes the gathering of information (Winter) the best choice procedure (Mongin-Walliser) he needs: to have previous information on available information to take into account the computing costs Hence, meta-optimization is engaged in an infinite regress which happens to be a vicious one (Mongin-Walliser)
Introduction (2) ¤ Bounded rationality was initially defined in a static way: • fixed environment - fixed beliefs and preferences • fixed choice rule (combining beliefs and preferences) ¤ Learning processes later introduce partial dynamics : • non stationary environment (game) - revised beliefs when new information comes in • endogenous preferences when more experience is acquired • adaptive choice rule → comparison by distinguishing information gathering, information treatment and choice → treatment in various contexts (epistemic logics, decision theory)
Statics: limited information on context ¤ uncertainty on context - plainly probabilistic - hierarchical but always probabilistic: ambiguity - non probabilistic: qualitative probabilities, belief functions Ex: Choquet utility maximization (Gilboa-Schmeidler) ¤ unawareness (not knowing and not knowing that not knowing) - treated in epistemic logics Ex: precautionary principle ¤ limited crossed beliefs - p-accuracy reasoning - k-level reasoning - crossed awareness (Meier et al) Ex: cognitive hierarchy model (Camerer)
Statics: limited information on preferences ¤ multidimensional preferences - preferences are decomposed and simplified Ex: satisficing (Simon), elimination by aspects (Tversky) ¤ random preferences - preferences correspond to alternative ‘moods’ Ex: discrete choice model, quantal model ¤ context-dependent preferences - situated preferences Ex: context (history)-dependent aspiration levels for satisficing - reference points (statu quo, norm) Ex: reference for gains vs losses in EU
Statics: simplified choice rule ¤ limited logical omniscience - treated in epistemic logics Ex: satisficing (sequential examination of actions) ? ¤finite number of internal states - simple expression of ‘computation complexity’ Ex: finite automata ¤computation costs - approximate cost of mental calculus Ex: basic operations
Dynamics: information research on context ¤ exogenous information, - resulting from purchase at specialized institutes and characterized by its value (opposed to its cost) Ex: signals about actual state (correlated to it) → limited relevance ¤ endogenous free information - resulting from repeated observation Ex: observation of other’s action in fictitious play →memory constraints → scope constraints (information neighbourhood) ¤ endogenously induced information - resulting from voluntary (suboptimal) action and characterized by its value (opposed to loss of utility) Ex: search procedures → ambiguous interpretation
Dynamics: information research on own’s preferences ¤observation of own’s past utility of actions - assuming that choice utility= (expected) felt utility Ex: CPR model → partial preferences (incompleteness) ¤ observation of other’s utility of actions - assuming that other’s utility = own’s utility (in same situations) Ex: imitation of successful opponents → biased preferences
Dynamics: treatment of information about context ¤ expectation process - especially of other’s strategy - stationarity assumption → extrapolative expectation Ex: fictitious play (probability = frequency of past actions) ¤ belief revision procedure - 3 contexts: updating, revising, focusing - possibility of contradiction between initial belief and message → simplified or distorted Bayes rule (judgment biases) Ex: weight between initial belief and message ¤ reconstruction of structural information - 3 types of information: factual (past), structural (constant), strategic (future) - pattern recognition (trends, cycles) - revelation of other’s preferences (abductive process) Ex: reputation effect
Dynamics: treatment of information about own’s preferences ¤ performance indices - (average or cumulative) index for each action - stationarity assumption → proxy for utility function Ex: CPR rule ¤ adaptation of aspiration levels - adaptive level for global index (for instance, best past utility) → proxy for utility level Ex: dynamic satisficing (Simon) ¤ reconstruction of structural information - design of relative vs absolute preferences Ex: regret matching (unconditional regret index: difference in the past between utility when using a given strategy and utility really obtained against others’ implemented strategies)
Dynamics: adaptive choice rule (1) ¤ inertial behaviour - repeated action if sufficient past payoff Ex: reaction to aspiration levels (continue if levels are reached) ¤ exploration behaviour - random exploration, fixed or decreasing - directed exploration Ex: randomized fictitious play ¤ exploitation behaviour - quasi optimizing behaviour Ex: fictitious play ¤ stochastic reinforcement behaviour - noisy best response - stochastic matching (probabilistic behaviour monotonic with utility) → implicit exploration-exploitation dilemma Ex: CPR (decreasing exploration)
Dynamics: adaptive choice rule (2) ¤ imitation - grounded on complementary preferences (preferential mimetism) - grounded on information differences (informational mimetism) - grounded on better experience (experienced mimetism) Ex: plain diffusion model imitation of successful opponents ¤ analogy-based reasoning - previous contexts (case-based reasoning) - repetitive game structures Ex: case-based rule (Gilboa-Schmeidler) analogical equilibrium (Jehiel)
Dynamics: adaptive choice rule (3) ¤ restricted choice rules - specific action set, for instance unidimensional Ex: stubborn rule (Laslier-Walliser) - specific beliefs, for instance objective probabilities Ex: stopping rules in search - specific preferences, for instance multicriteria choice Ex: choice 2 by 2 + synthesis ¤ context- adaptive choice rules - parlour games: chess, cards, Cluedo Ex: keep pawns tight - sports Ex: throwing a ball with constant angle - labyrinth, puzzles Ex: keep right rule
Asymptotic results ¤ system’s trajectory - transitory state - asymptotic state (speed of convergence) → different time scales (role of random shocks) ¤ convergence of expectations - towards locally rational ones ¤ convergence of actions (or strategies) - elimination of (strictly) dominated strategies, - convergence notions towards equilibrium states - convergence in time-average or action by action Ex: fictitious play - convergence towards a unique or multiple (point)-equilibrium (selection when exploration vanishes) - cyclical and chaotic attractors
Conclusion (1) ¤dispersed models, even if two main classes (grounded on cognitive capacities) - belief-based learning Ex: fictitious play - reinforcement learning Ex: CPR ¤ combination of models - models depending on choice context and results - hybrid models - models with heterogenous agents ¤ need to consider the precise reasoning modes followed by agents: - counterfactual reasoning (simulation of opponents) - abductive reasoning (detection of structural or behavioral regularities) - analogical reasoning (situations treated as similar) - taxonomical reasoning (categorization)
Conclusion (2) ¤ possibility of meta-learning - belief revision rule Ex: parameter trading off initial belief and message in extended Bayes rule - preferences Ex: degree of altruism in individual preferences - choice rule Ex: parameter in logit rule → learning levels give again rise to an infinite regress (in order to solve it, highest level has to be given) ¤ infinite regress stopped by evolution process, but - mix of evolutive process (capacities and constraints imposed by evolution) and cultural process (capacities and constraints conditioned by society) - very slow time scale (against fluctuating environment) - concrete mechanism not exhibited