Thorndike’s puzzle box Skinner box The card flipping task Probability learning

Mainstream analyses of economic behavior assume that incentives shape behavior even when individual agents have limited understanding of the environment. The shaping process in these cases is indirect: The economic incentives determine the agents’ experience, and this experience in turn drives future behavior. Review of the classical study of learning in PsychologyIdoErev, Technion and Univ of Warwick(mostly based on the chapter Learning and the Economics of Small Decisions written with ErnanHaruvy for the 2ndvol of the handbook of exp. Econ, edited by Kagel & Rothhttp://www.utdallas.edu/~eeh017200/papers/LearningChapter.pdf) Consider, for example, an agent that has to decide whether to cross the road at a particular location and time. The agent is not likely to understand the exact incentive structure and compute the implied equilibria. Rather, she (he or it) is likely to response to past experiences. The current class reviews experimental studies that explore this shaping process.

The classical study of learning in psychology (operant and classical conditioning) used many paradigms and different species. Thorndike’s puzzle box Skinner box The card flipping task Probability learning The clicking paradigm. • Two measures are taken here to clarify the observed results: • A focus on pure decisions from experience. • A focus on replications under a “standard” paradigms” (Hertwig and Ortmann, 2002) with monetary payoffs.

The clicking paradigm The current experiment includes many trials. Your task, in each trial, is to click on one of the two keys presented on the screen. Each click will be followed by the presentation of the keys’ payoffs. Your payoff for the trial is the payoff of the selected key. 0 1 You selected Right. Your payoff in this trial is 1 Had you selected Left, your payoff would be 0 Not a test of rational economic theory The rationality assumption is not even wrong 3

Relationship to the “SEU correction project” (Gigerenzer & Selten, 2001) Most studies in behavioral economics use the rational (or SEU) model as a benchmark, highlight deviations, and try to correct the model. To clarify violations this research focuses of decisions from description (Allais, 1953; Kahneman & Tversky, 1979), and/or framing effects (Ariely, 2008). Behavioral psychologists take a different approach. Decisions from description A 5 with certainty B 5000 with p = 1/1000; 0 otherwise Decisions from experience

1. Underweighting of rare events(Barron & Erev, 2003) • P(R) Risk Seeking Risk Aversion Experience-Description gap Occurs in one-shot decisions from sampling(Hertwig et al., 2004) Implies a reversed Allais paradox: (4, .8) > 3, but (4,.2) ~ (3, .25) Robust to prior information (Lajarraga & Gonzalez, 2011) Occurs with full and partial feedback feedback, robust to conversion method Similar pattern in Honey Bee (Shafir et al., 2008) Taleb’s Black Swan effect Sensitivity to magnitude: -20 vs. -10 (Ert & Erev, 2013) Consistent with Skinner’s (1953) shaping procedure.

2. The payoff variability effect (Myers & Sadler, 1960; Busemeyer &Townsend, 1993 ). Risk aversion Or Loss aversion? Neither!!

3. The Big Eye effect (Ben Zion et al., 2010, Grosskopf et al., 2006) x ~ N(0,300), y ~ N(0, 300) R1: x R2: y M: Mean(R1,R2) + 5 Deviation from: maximization, risk aversion, loss aversion. Implies under-diversification Robust to prior information 4. The hot stove effect (Hogarth & Einhorn, 1992; March and Denrell, 2002).

The reliance on small samples summary The main properties of decisions from experience can be captured with the assertion that people rely on small samples of experiences. The value of this assumption is supported by studies that examine free sampling (Hertwig et al., 2004), and in two open choice prediction competitions (Erev, Ert & Roth, 2010a; 2010b).

But why would people rely on small samples (Plonsky & Erev, 2013)? Cognitive limitations (Hertwig et al., 2004; Fox & Hadar, 2006). Cognitive sophistication in a dynamic (agile) environment. Assuming that the payoff distributions can change from trial to trial, reliance on past experiences in similar situations (“the contingencies of reinforcements”, Skinner, 1953; Gilboa & Schmeidler, 1995) approximates the optimal strategy but implies reliance on small samples. The implementation of this idea in Amazon.com and Google (where it is known as collaborative filtering) has saved my life more than once. Rfc

Similarity-Based Learning Strategies t = 1 t = 2 t = 3 t = 4 t = 5 t = 6 t = 7 t = 8 t = 9 0 0 0 0 0 0 0 0 ? + + - + + - + + ? 1 1 1 1 1 1 1 1 Similarity-Based strategy, based on the last two outcomes: +1 +1 +1 +1 -1 -1 +1 +1 -1 -1

Why Use Them? They Work! • Agile environments • Unknown dynamics • 99.4% optimal choices Double-edged sword • Rigid environments subset • Underweighting of rare events + - + -

Agile Problem:Sensitivity to the last two outcomes Rigid Problem:Fixed decision Always 0 +8 or -1 -1 -1 • +8 • -1 -1 -1 • -1 • -1 -1 • +8 • +8 • +8 -1 -1

Agile Problem:Sensitivity to the last TEN outcomes Rigid Problem:Fixed decision Always 0 +1 or -10

More phenomena and more similarity-based reasoning Probability matching and the effect of experience Many of the early attempt to study decisions from experience used the probability learning paradigm (see Estes, 1951). The basic task in each trial is to guess if Event E (with P(E) > .5) will occur. And the correct guesses were rewarded. This paradigm implies: Since P(E)> .5, the optimal response is H in all trials. The early studies show that after 60 trials the H-rate implies probability matching. For example, when P(E) = 0.70. the H-rate was close to 70%. Longer experience leads behavior toward maximization (H-rate of 90% after 1000 trials, Edwards, 1960; and see Vulkan, 2000; Shanks et al., 2002). Blavatskyy(2006) shows that probability matching implies reliance on sample of size 1. The results suggest that the sample increases with experience.

Surprise-triggers-change Evaluation of the sequential dependency in 2-alternative studies reveals a 4-fold recency pattern:

Wavy impact curves (Erev & Teodorescu, 2013) 0 or (+10, .1; -1) The impact of a gain of 10 at t is initially (t+1) positive, then negative (t+3), then positive with a peach at t+15, and it diminishes Two types of similarities The peak at t + 1 can be a reflection of temporal similarity explains. The dip at t + 3 is explain by the observation that if wins (W) are rare, there are less past experiences after WLL, than after LLL. Thus, rare outcomes leads to reliance on smaller set of past experiences in the following trial, and for that reason increases underweighting of rare events.

The partial reinforcement extinction effect (Humphreys, 1938; Hochman & Erev, 2013) 9 8

Relationship to popular learning models The assumption that people select the option that led to good outcomes in similar situations in the past, implies that the prediction of choice behavior requires an abstraction of similarity (or of the implication of this concept). Previous research tries to address (or justify ignoring) this observation in several ways: 1. Basic models (reinforcement learning, fictitious play and EWA) focus on static settings, and assume that only temporal similarity is important. 2. Sampling models assume that when the environment is nearly static, the most important implication of similarity based decisions is reliance on (nearly random) small samples. On example is I-SAW. 3. Instance based models (Gonzalez et al. 2003) assume a general similarity function that combines temporal and other forms of similarity.

Two recent choice prediction competitions highlight that advantage of the “small samples” approach. Yet, this approach fails when the environment is dynamic, and people learn to response to correlations and pattern in an adaptive pattern (as in the Plonsky & Erevstudies). One promising direction is a machine-learning-like assumption that people consider many possible similarity functions in parallel, and start each choice by trying to select the best function.

For example, consider the decision at trial 7 of the decision maker that has experienced the following sequence of outcome from the nonzero key: The model assumes that the agents consider the following rule:

Our (Erev, Roth, Ert & Plonsky) next choice prediction competition will focus on dynamic choice problem. Safe: 0 with certainty Risk: Gain in ne class of state of nature; loss in the other states. The state will be determined by a multi state Markov chain The feedback will be limited to the obtained and forgone payoffs (so the states will be only partially observable). An estimation study with 60 different problem, and the call for the competition will be published in January 2014. The submission deadline will be around May 2014. 21

Multiple alternatives (Teodorescu & Erev, 2013): Many behavioral problems appear to reflects insufficient exploration, but in other cases people appear to exhibit over-exploration. Problem Rare disasters: 10% disasters (-10), 90% rewards (+1) Problem Rare treasure: 10% treasures (+10), 90% disappointments (-1) The two stage explanation: An initial choice based whether tto explore (partially based on a small sample), and then a choice between the alternatives.

Learned helplessness (Seligman, 1975)

Two explanations: 1.Learning that the agent has no control 2.Learning that exploration is not reinforcing

Recent competitions (Erev et al. 2010a, 2010b) 1. Individual choice tasks http://tx.technion.ac.il/~eyalert/Comp.html The task: Predicting the proportion of risky choices in binary choice task in the clicking paradigm without information concerning forgone payoffs. Two studies (estimation and competition) each with 60 conditions. We published the estimation, and challenge other researchers to predict the result of the second. The models were rank based on their squared error. The best baseline is a predecessor of I-SAW. The winning submission, submitted by Stewart, West & Lebiere is based on a similar instance based (“episodic”) logic (with a quantification in ACT-R). Reinforcement learning and similar “semantic” models did not do well. 25

2. Market entry games http://sites.google.com/site/gpredcomp The task: Predicting behavior in a repeated 4-person market entry games with complete feedback. At each trial each player has to choose between: R: Entering a risky market (expected payoff decreasing with entrants) S: Staying out (a safer option) Two studies (estimation and competition) each with 40 conditions We published the estimation, and challenge other researchers to predict the result of the second. The models were rank based on their squared error. The best baseline is I-SAW. The winner, Chen et al., is a variant of I-SAW The running up, Gonzalez, Dutt & Lejarraga, is a similar instance based (“episodic”) logic. 26

Thorndike’s puzzle box Skinner box The card flipping task Probability learning