220 likes | 237 Views
PSY 402. Theories of Learning Chapter 7 – Behavior & Its Consequences Instrumental & Operant Learning. Stimulus Control. Skinner discovered that stimuli (cues) provide information about the opportunity for reinforcement (reward). The stimulus sets the occasion for the behavior.
E N D
PSY 402 Theories of Learning Chapter 7 – Behavior & Its Consequences Instrumental & Operant Learning
Stimulus Control • Skinner discovered that stimuli (cues) provide information about the opportunity for reinforcement (reward). • The stimulus sets the occasion for the behavior. • Fading – gradually transferring stimulus control from a simple stimulus to a more complex one. • Operant behavior is controlled by both stimuli and reinforcers.
Discriminative Stimuli • Discriminative stimuli act as “occasion setters” (see Chap 5) in classical conditioning. • The stimulus that signals the opportunity for responding and gaining a reward is SD. • The stimulus that signals the absence of opportunity is SD.
Types of Reinforcers • Primary reinforcer – stimuli or events that reinforce because of their intrinsic properties: • Food, water, sex • Secondary reinforcer – stimuli or events that reinforce because of their association with a primary reinforcer: • Money, praise, grades, sounds (clicks) • Called conditioned reinforcers.
Behavior Chains • Secondary (conditioned) reinforcers reward intermediate steps in a chain of behavior leading to a primary reinforcer. • Secondary reinforcers can also be discriminative stimuli that set the occasion for more responding. • Classical conditioning is a glue that enables chains of behavior leading to a goal.
Schedules of Reinforcement • Behavior is recorded continuously on a drum recorder. • A cumulative graph shows the rate of responding over time. • The steepness of the line indicates how quickly the rat is responding • Hash marks indicate when reward was given. • CRF (Continuous reinforcement) – the rat is rewarded every time it does the behavior.
Ratio Schedules • Fixed Ratio (FR) – there is a ratio between responding and reward. • The rate is rewarded for every xth behavior. • FR-15 means the rat gets one reward for every 15 behaviors (e.g., bar presses). • Variable Ratio (VR) – the number of responses needed varies, but averages out to a particular ratio. • VR-15 – the ratio varies but averages to 1:15.
Interval Schedules • Fixed Interval (FI) – rewards are given for the first response after a given amount of time has passed. • FI-15 means one reward is given after 15 minutes, but only if the rat does the behavior. • Variable Interval (VI) – rewards are given after varying amounts of time that average to a particular interval. • VI-15 means one reward after average of 15 min.
Effects of Schedule on Behavior • FR leads to steady responding but post reinforcement pauses occur after each reward. • VR leads to a high rate of responding with no pauses – never know when reward will occur. • FI leads to behavior right before the end of each interval, with goofing off in between. • Scallops in the cumulative record • VI leads to the lowest rate of responding.
Compound Schedules • Multiple schedules – two or more schedules alternate, each signaled by a different SD. • Mixed schedules – schedules alternate but no stimulus signals which type is being used. • Chained schedules – completion of one leads to the beginning of a new schedule (with SD). • Tandem schedules – liked chained but no SD. • Differential High/Low Responding – specifies the behavior and the deadline (interval).
Choice • Concurrent schedules – two different types of behavior are offered, each with its own schedule of reinforcement. • Behavior on concurrent schedules follows Herrnstein’s Matching Law. • The proportion of behavior allocated to a choice is the same as the proportion of reward offered. • B1 / (B1 + B2) = R1 / (R1 + R2) or • B1 / B2 = R1 / R2
The Law Works for Reward Size • The amount of responding is proportionate to the relative reward sizes. • If V1 and V2 are different reward sizes, then • B1 / B2 = V1 / V2 • The Matching Law says nothing about what people or rats are thinking. • Melioration – a strategy of shifting between two choices until the rewards are equal.
A Law for One Choice • If the total amount of behavior (B1 + B2) is K, then the rate of responding to a single choice (B1) is: • B1 = K x R1 / (R1 + Ro) • Ro is the reinforcement rate for some other choice (the reward for doing something else). • This is called the Quantitative Law of Effect because it predicts the amount of responding.
Implications of the Law • According to the Law, a particular behavior can be weakened by providing rewards for other behaviors in the environment. • Drug abuse is more likely for people who have little other reward in their lives. • Problems can be prevented by making sure there are reinforcers for pro-social behaviors. • More positive environments can be built.
Impulsiveness • Delayed gratification – the willingness to set aside an immediate reward in favor of a long-term, larger reward. • People find this difficult to do. • Self-control = delaying gratification. • Impulsive behavior is more likely when small rewards are imminent (immediate, salient).
Hot and Cold Thoughts • Imagining the desirable qualities of an immediate reward undermines self control. • Distraction by thinking about something unrelated supports self control. • Drug abusers have difficult with self-control. • Impulsivity may be domain-specific (depend on the kind of reward involved). • Although mentalistic, “self-control” is defined in terms of specific behaviors and choices.
Behavioral Economics • Not all reinforcers are alike – substitutability is a continuum (varies). • Demand curve – does consumption vary with price? • Elastic commodities do, inelastic ones (necessities) do not. • Reinforcers can be substitutes, independents, or complements, depending on their demand curves.
Theories of Reinforcement • Drive Reduction – Hull • Reinforcement occurs when the consequence of behavior reduces a drive (hunger, thirst). • Not everything reinforcing reduces a drive, and some reinforcers increase drives (stimulation). • Premack’s Principle – behaviors can be reinforcers (not just stimuli such as food). • The chance to do a preferred behavior is a reward
Problems with Premack’s Principle • Prior preferences are important to the theory, but how can they be determined in advance? • Restricting a behavior creates a void where the person must do something – this may account for the observed increase, not reward. • Access to even a less-preferred but restricted behavior can be reinforcing. • The reinforcer need not be preferred behavior.
Behavioral Regulation Theory • Response deprivation theory – every behavior has a natural level (the amount someone wants to do if there are no restrictions). • A behavior will be rewarding if restricted below the natural level. • Also called behavioral regulation theory.
Blisspoint • The blisspoint is the amount of each of two behaviors someone would do if unrestricted. • Minimum distance model – someone will do enough of each of two behaviors to get as close as possible to the blisspoint. • When two behaviors are contingent, the blisspoint is the perpendicular distance from the line for a reinforcement schedule.
Selection by Consequences • Reinforcers select behaviors by weeding out the ones that are less efficient in obtaining rewards. • Skinner called this “selection by consequences.” • A process similar to evolution encourages some behaviors and leads to extinction of others, shaped by consequences of actions.