Instrumental or operant conditioning

Instrumental or operant conditioning

The instrumental paradigm • S --- R --- SR • In a stimulus situation (S), a response (R) is followed by a reinforcing stimulus (SR). • Comparison with classical conditioning: Classical Operant CR is elicited R is emitted CR prepares for US R to get SR Learns contingency Learns contingency

More similarities • Practice effects • Extinction and spontaneous recovery • Delay effects • Contingency-dependent • Blocking and configural learning • Generalization and discrimination • Associative bias

What is associated? • S - R? • Thorndike’s Law of effect: Contiguity • But Tinklepaugh (1928) showed that organisms learn to expect particular reinforcers: Monkeys given a leaf of lettuce showed disappointment when they were expecting a slice of banana. So: • R - SR ? Or • S - R - SR ? Colwill and Rescorla (1985...)

Colwill and Rescorla studies • Phase 1: R1(Bar press) --> SR1 (salty chow) and R2(Chain pull) --> SR2 (tasty chow) • Phase 2: Feed with SR1 (salty chow) outside the experimental chamber. • Test: Pulls chain. • Analogous results were found for SR devaluation. Clearly, the SR is involved in the learning process.

Learning without reinforcement(cf. sensory preconditioning) • The noise of a lever press is learned in free exploration of a Skinner box. • Then the noise -- but no lever -- is paired with an SR of food. • When presented with the lever, the rats press it: Such inferential learning of neutral stimuli is essential to learning a chain of responses leading to terminal reinforcement.

Secondary reinforcement(cf. second-order conditioning) • Learn that noise of bar press ---> food • Learn that bar press ---> noise • Animal will press bar to hear noise • An association is learned between noise and food. Humans -- and chimpanzees -- can learn such an association between primary reinforcers and secondary reinforcers such as money or tokens…or grades.

Effects of secondary reinforcers • Secondary reinforcers help bridge delays. • Secondary reinforcers provide feedback.

The CS in instrumental learning • Organisms can generalize and discriminate from the CS in instrumental/operant conditioning as well. • The peak shift: If CS+ and CS- are both trained in an operant discrimination experiment, the maximum rate of responding will not be to CS+ but to a stimulus farther from the CS- , where the generalization of CS- inhibition is less.

The peak shift Generalization of Excitation Size of response Generalization of Inhibition Peak shift CS- CS+ Stimulus dimension --->

More observations on the CS • The Gestalt psychologists and relational responding • Terrace’s errorless discrimination learning • Effectively teaches discrimination when relying on species-specific traits • Reduces emotional responses to CS-

Discrimination learning makes a dimension relevant After discrimination training: Pitch is relevant. Simple generalization: Pitch is not relevant Response rate Response rate Pitch Pitch

Concept learning: Dimensions • Dimensional learning or attentional learning • Discrimination makes a dimension relevant • Processing capacity is limited, so learning depends on which dimension is made relevant • Except for very young children, reversal shift learning is easier than dimensional shift learning. Reversal shift requires inhibition.

Shift learning • Training: • Positive • Negative • Reversal shift • Positive • Negative

Shift learning • Training: • Positive • Negative • Nonreversal shift • Positive • Negative

Shift learning • Training: • Positive • Negative • Intradimensional shift • Positive • Negative

Shift learning • Training: • Positive • Negative • Extradimensional shift • Positive • Negative

Reversal shift is intradimensional; nonreversal shift is extradimensional (changes dimension)

Concept learning: Categories • Pigeons can learn to peck at slides containing trees and not at slides that do not contain trees. • Pigeons can learn to peck at one key for pictures of cats and at another key for pictures of flowers. • However, pigeons find it hard to discriminate between one set of cats and another, or one set of flowers and another.

Category learning in humans • Like the pigeons, people find it easy to discriminate between categories. • Also like the pigeons, people find it hard to discriminate within categories. Examples? • And yet it is easy to discriminate within categories of which we are members. • The ease of discrimination between categories biases us toward categorical thinking.

What is the CR ? • Response equivalence based on effects • Place learning: Shift strategies and foraging. • Shift strategies are harder to learn if some food remains in the first place visited. • Species-specific responding: Instinctive drift • Behavior systems analysis (Timberlake)

Predator learning • Predators have a UR of approaching and contacting the US of prey characteristics. • But they must learn effective biting and to attack motionless prey (Eibl-Eibesfeldt ‘70) • Is the learning due to instrumental conditioning? Young predators improve even if their attacks are unsuccessful. • Removing the CS for predators: Tiger attacks in northern India.

Autoshaping or sign tracking • Pigeons will learn to peck a lighted key that predicts food, even though the food is not contingent on pecking the key (Brown & Jenkins, 1968) • But the key peck is not an operant: • Pigeons autoshape even if the response is prevented by a plexiglass barrier. • Pigeons peck the key differently for food and water reinforcers (Jenkins & Moore, 1973). • Male pigeons learn to court a light that predicts access to their mates (Rackham, 1971).

Contiguity vs. contingency, again • Are R and SR connected because they are contiguous or contingent? • Hammond’s (1980) contingency study: • Phase 1: Reinforce only 5% of responses • Result: 3000 bar presses per hour • Phase 2: Continue phase 1 reinforcement, but add reward 5% of the time when no response was made. • Result: Response rate trailed off to near zero • Phases 3 & 4 repeated 1 & 2 (ABAB design)

Hammond’s (1980) results Phase 1 2 3 4 Phase Conting. Phase p(SR) 1 .05 - 0 2 .05 - .05 Rate of responding 3 .05 - 0 4 .05 - .05 Training periods

More contingency-contiguity research • Partial reinforcement, contingency learning and the partial reinforcement extinction effect • Superstitious learning: • Skinner and contiguity explanations • Staddon and Simmelhag: • Interim behaviors • Terminal behaviors

Learned helplessness • Seligman and Maier’s classic study (1967) • Latent inhibition explanations: Learned irrelevance of responding • Treatment by forced success experiences • Inoculation by early success experiences • Childhood competence vs. inferiority • Classroom success in early childhood education • What about early sexual experiences? Abuse?

Associative bias • Behaviors that occur with hunger are easily conditioned with food rewards; grooming behaviors are not • Species-specific defense reactions (SSDRs): • Rats learn to flee more readily than to press a bar to terminate shock • Rats learn to press a bar more readily than to flee to obtain food • Human examples of SSDRs: Skiing downhill, driving through a skid

Instrumental or operant conditioning