Instrumental Conditioning

Instrumental Conditioning Also called Operant Conditioning

Instrumental Conditioning Procedures Positive Reinforcement Punishment Response increases Response decreases Omission Training Negative Reinforcement Response decreases Response increases

Instrumental Conditioning involves three key elements: • a response • an outcome (the reinforcer) • a relation, or contingency, between the R and O

The Instrumental Response • usually an arbitrary motor response • for example, bar-pressing has nothing to do with • eating food • there are limits on the types of responses that can be • modified by instrumental conditioning • relevance, or belongingness, is an issue in instrumental • conditioning as well as in Pavlovian conditioning

Relevance, or Belongingness, in Instrumental Conditioning Certain responses naturally ‘belong with’ the reinforcer because of the animal’s evolutionary history Just like all CSs are not equally associable with all USs, not all responses are equally conditioned with all reinforcers

Shettleworth tried to condition various behaviors with food reward in hamsters • used a number of different behaviors • digging and face-washing • some responses are more relevant to food reward than others • behavior such as digging increase the chances of coming in • contact with food • face-washing won’t increase the chances of coming in contact • with food; may even interfere with food-related behaviors

The Brelands trained many different species to perform tricks for ads, movies, etc. e.g., pigs putting coins in a piggy bank.

Instinctive Drift Often they found that once the response was trained, it would deteriorate; other “instinctive” behaviors (e.g., rooting the coins) would “drift” in and interfere with performance of the operant response. The pigs treated the coins as if they were food and these food related behaviors interfered with the response the Brelands were trying to condition

The Instrumental Reinforcer Increases in the quantity or quality of the reinforcer can increase the rate of responding • Experiment by Hutt (1954) – described in the text • In runway experiments, animals will run faster for bigger reward Responding to a particular reward also depends on an animal’s past experience with other reinforcers • Experiment by Mellgren (1972) – described in the text • Experiment by Crespi (1942)

Experiment by Crespi (1942) 3 groups of rats were given 20 trials to run down an alleyway for food Group 1: large reward – 64 pellets Group 2: medium reward – 16 pellets Group 3: small reward – 4 pellets

Crespi (1942) In phase 2, the reward level was switched for 2 groups Group 1: 64 pellets – 16 pellets; negative contrast Group 2: 16 pellets – 16 pellets Group 3: 4 pellets – 16 pellets; positive contrast Crespi compared groups who were switched to 16 pellets from a large or small reward to a group consistently given 16 pellets Positive contrast (4-16 pellets) Ran faster Negative contrast (64-16 pellets) Ran slower

Positive and negative contrast indicate that behavior is not just affected by current conditions Performance is also affected by previous reward conditions

The Response – Reinforcer Relation Two types of relationships exist between a response and a reinforcer: • temporal relationship; temporal contiguity refers to the delivery of the reinforcer immediately after the response • causal relationship; response-reinforcer contingency refers to the extent to which the response is necessary and sufficient for the occurrence of the reinforcer

Effects of temporal contiguity Instrumental learning is disrupted by delaying the reinforcer after the response Dickinson et al (1992) • rats were reinforced for lever-pressing • varied the delay between occurrence of the response and delivery of the reinforcer

Why is instrumental conditioning so sensitive to a delay of reinforcement? Delay makes it difficult to figure out which response is being reinforced There are ways to overcome the problem: 1. Provide a secondary, or conditioned, reinforcer immediately after the response, even if the primary reinforcer does not occur until later A secondary or conditioned reinforcer is a conditioned stimulus that was previously associated with the reinforcer Conditioned reinforcers can serve to ‘bridge’ a delay between the response and the primary reinforcer

2. Another technique that facilitates learning with delayed reinforcement is to mark the target response to distinguish it from other responses The marking procedure demonstrated by Lieberman et al (1979) They tested whether rats could learn a correct turn or choice in a maze despite a long delay of reward

Subjects were placed in the start box and allowed to choose one of two alleyways (White was correct) Three groups: Group 1: Light – after they made a choice, rats in this group received a 2 s light (regardless of choice) and were allowed to go to the delay box Group 2: Noise – treated the same, except 2 s noise Group 3: Control – no stimulus; went directly to delay box after the choice All rats confined to the delay box for 2 min, then allowed to go to the goal box. Food was given, but only if they had chosen white.

Results: Control group stayed at approximately 50% correct Light and Noise groups learned the discrimination (i.e., learned to choose white over black)

So why did the Light and Noise improve discrimination learning? • the cues helped to mark the choice response in memory • after making a choice and receiving the L or N, subjects more effectively rehearsed the choice they had just made • when reward was given later on, after 2 min delay, the memory for previous choice was stronger • these effects of marking cannot be explained in terms of secondary or conditioned reinforcement because the marking stimulus was presented after both correct and incorrect choices

Instrumental Conditioning