Schedules of reinforcement

Schedules of reinforcement Simple schedules of reinforcement • CRF • FR • VR • FI • VI Response-rate schedules of reinforcement • DRL • DRH

Why do ratio schedules produce higher rates of responding than interval schedules? Inter-response time (IRT)

Francis sells jewelry to a local gift shop. Each time he completes 10 pairs of earrings, the shopkeeper pays him for them. This is an example of a schedule of reinforcement. A. Fixed ratio B. Variable ratio C. Fixed interval D. Variable interval

Vernon is practicing his golf putting. On the average, it takes him four tries before the ball goes in the hole. This is an example of a schedule of reinforcement A. Fixed ratio B. Variable ratio C. Fixed interval D. Variable interval

Sandra’s mail is delivered every day at 10:00. She checks her mailbox several times each morning, but only finds mail the first time she checks after 10:00. This is an example of a schedule of reinforcement A. Fixed ratio B. Variable ratio C. Fixed interval D. Variable interval

Paula is an eager third-grader, and loves to be called on by her teacher. Her teacher calls on her approximately once each period, although Paula is never sure when her turn will come. This is an example of a schedule of reinforcement A. Fixed ratio B. Variable ratio C. Fixed interval D. Variable interval

Concurrent schedules of reinforcement Two schedules are in effect at the same time and the subject is free to switch from one response alternative to the other

Choice Behavior and the Matching Law The Matching Law is a mathematical statement describing the relationship between the rate of responding and the rate of reward • developed by Herrstein • Relative rate of responding on a particular lever equals the relative rate of reinforcement on that lever

The Matching Law Formula: Ra = Fa (Ra + Rb) (Fa + Fb) Ra and Rb = # of responses on schedules a and b Fa and Fb = # (frequency) of reinforcers received as a consequence of responding on schedules a and b

The Matching Law Herrnstein found that pigeons matched their responses on a given key to the relative frequency of reinforcement for that key That is, the # of pecks on Key A relative to the # pecks on key B matched the # of rewards earned on schedule A relative to the # of rewards earned on schedule B Have similar formula and see similar results for: - magnitude of reward - immediacy/delay of reward

Evaluation of the Matching Law The matching law provides an accurate description of choice behavior in many situations, but there are exceptions and problems • overmatching • undermatching • bias • ratio versus interval schedules

Overmatching • higher rate of responding for the better of the two schedules than the matching law predicts • overmatching occurs when it is costly for a subject to switch to the less preferred response alternative (e.g., when the two levers are far apart)

Undermatching • occurs when the subjects responds less than predicted on the advantageous schedule • absolute versus relative value of the amount or frequency of reward • for example, the matching law predicts subjects • should make same choice when reward magnitudes • are 5 versus 3, as when the magnitudes are 10 • versus 6, or 100 versus 60 • however, when absolute values are increased, the • matching law is not always accurate

Experiment by Logue & Chavarro (1987) • varied absolute reward magnitude but kept ratio at 3:1 for the left key and the right key • what the authors found was that the proportion of responses devoted to the better choice declined as the absolute values of the reward increased • response on left key = 3 grains/pellets of food • response on right key = 1 grain/pellet of food • the matching law worked in this example, but then • they increased the absolute value of reward • response on left key = 30 grains/pellets of food • response on right key = 10 grains/pellets of food • in this example the animals responded more on the • right key than the matching law would predict

Bias • subject may have a special affinity or preference for one of the choices • a rat may prefer the R lever over the L lever or a pigeon may prefer a red key over a green key Ratio versus interval schedules • animals do not match when given concurrent ratio schedules

Theories of Matching • the matching law is merely a description of behavior • it does not say why a subject behaves the way it does • there are two main explanations of why animals match • maximization • melioration

Maximization • subjects attempt to maximize the rate of reinforcement • animals have evolved to perform in a manner that yields the greatest rate of reinforcement • can explain why subjects match with concurrent VI-VI schedules but not with concurrent ratio schedules • molecular and molar maximizing theories • according to molecular theories, animals choose • whichever response alternative is most likely to be • reinforced at that time • according to molar theories, animals distribute their • choices to maximize reward over the long run

Melioration • ‘make better’ • melioration mechanisms work on a time scale that is not molecular or molar • matching behavior occurs because the subject is continuously choosing the more promising option – that is, the schedule with the momentarily higher rate of reinforcement • subjects are continuously attempting to better their current chances of receiving reward by switching to the other choice

Choice with Commitment In a standard concurrent schedule of reinforcement, two (or more) response alternatives are available at the same time and the subject is free to switch from one to the other at any time However, in some (real-life) situations, choosing one alternative makes other alternatives unavailable In these cases, the choice may involve assessing complex, long-range goals Can study these types of situations in the lab using a Concurrent-chain schedule of reinforcment

Concurrent-chain schedule Pecking the left key in the choice link puts into effect reinforcement schedule A in the terminal link. Pecking the right key in the choice link puts into effect reinforcement schedule B in the terminal link.

Self-Control Concurrent chain schedules have been used to study ‘self-control’ in the lab e.g., choosing a large delayed reward over an immediate small reward With direct choice procedures, animals often lack self-control. That is, they choose the immediate, but smaller reward With concurrent-chain procedures, animals do show self-control. That is, they choose the larger, but delayed reward

Direct-choice procedure Pigeon chooses immediate, small reward Concurrent-chain procedure Pigeon chooses the schedule with the delayed, larger reward

Chapter 7 The Associative Structure of Instrumental Conditioning

Instrumental conditioning permits the development of several types of associations The instrumental response (R) occurs in the presence of distinctive stimuli (S) and results in the delivery of the outcome (O) • S-R • S-O • R-O

The S-R Association and the Law of Effect According to Thorndike, animals form an S-R association • an association between the stimuli present in the • experimental situation and the instrumental response Law of Effect • according to the law of effect, the role of the reinforcer (or response outcome) is to ‘stamp in’ an association between the contextual cues (S) and the instrumental response (R) • an important implication of the Law of Effect is that instrumental conditioning does not involve learning about the reinforcer

Expectancy of Reward and the S-O Association Seems intuitive to think that instrumental conditioning would involve the subject learning to expect the reinforcer However, Thorndike and Skinner did not talk about the cognitive notion of an expectancy The idea that reward expectancy may motivate instrumental behavior came from developments in Pavlovian conditioning In Pavlovian conditioning, animals learn about stimuli that signal some important event One way to look for reward expectancy is to consider how Pavlovian processes might be involved in instrumental conditioning

Modern Two-Process Theory The instrumental response is motivated by two factors • first, the presence of S comes to evoke the response directly, through a Thorndikian S-R association • second, the instrumental response comes to be made in response to the expectancy of reward because of an S-O association • through the S-O association, S comes to motivate the instrumental behavior by activating a central emotional state • the implication is that the rate of an instrumental response will be modified by the presentation of a classically conditioned stimulus

Modern Two-Process Theory Studies that evaluate modern two-process theory employ a transfer-of-control experimental design • phase 1 = operant conditioning • phase 2 = Pavlovian conditioning • phase 3 = transfer phase • the subjects are allowed to engage in the instrumental response and the CS from phase 2 is periodically presented to observe its effect on the rate of the instrumental response • where have we seen this before??? • CER (Conditioned Emotional Response) procedure

Evidence of R-O Associations Neither the S-R nor the S-O association involves a direct link between the R and the outcome, but R-O association intuitively makes sense A common technique for assessing R-O associations involves devaluing the reinforcer after conditioning to see if this decreases the instrumental response Read experiment by Colwill & Rescorla (1986) described on pp. 197-98 of textbook

Schedules of reinforcement