520 likes | 545 Views
Chapters 5. Instrumental Learning & Operant Reinforcement. Operant Learning. Stimulus Response Outcome. Classical vs. Operant. Classical Requires reflex action Neutral stimulus associated with US Outside of subject’s control Operant Strengthening/weakening of “voluntary” action
E N D
Chapters 5 Instrumental Learning & Operant Reinforcement
Operant Learning • Stimulus • Response • Outcome
Classical vs. Operant • Classical • Requires reflex action • Neutral stimulus associated with US • Outside of subject’s control • Operant • Strengthening/weakening of “voluntary” action • Subject responds or doesn’t • Can operate together
What’s in a Name? • Operant learning: subject operates on environment • Instrumental conditioning: subject is instrumental in obtaining outcome
Trial and Error Learning • E.L. Thorndike • Animal intelligence • Maze studies
Puzzle Box • Cats • Cage with mechanism to open door • Escape latency • Discrete trial procedure
Law of Effect • Any behaviour followed by an appetitive stimulus will increase in frequency
Terms • Operant (response): any behaviour that operates on the environment to produce an effect • Reinforcer: any event that increases the frequency of a behaviour • Punisher: any event that decreases the frequency of a behaviour
Operant Learning • B.F. Skinner • Operant chamber • Free operant procedure
Discrete Trial & Free Operant • Discrete • One trial at a time • “Apparatus” must be re-set • Measure some behaviour • e.g., mazes • Free • Operant can occur at any time • Operant can occur repeatedly • Response rate • e.g., operant chamber
Four Contingencies • Positive reinforcement • Negative reinforcement • Positive punishment • Negative punishment
Positive and Negative • Positive: presents some stimulus • Negative: removes some stimulus
Reinforcers and Punishers • Reinforcer: increases a behaviour • Punisher: decreases a behaviour
Response Rate: Increases Decreases Presented Response Causes Stimulus to Be: Removed Contingencies Positive Reinforcement Positive Punishment Lever press --> Food Lever press --> Shock Negative Reinforcement Negative Punishment Lever press --> Shock off Lever press --> Food removed
Types of Reinforcers • Primary • Not dependent on an association with other reinforcers • Secondary • Initially neutral stimulus • Paired with primary reinforcer • “Conditioned Reinforcer”
Secondary Reinforcers • “Bridging”, “clicker” • Secondary extinction without periodic pairings with primary • Generally weaker than primary • Generalized reinforcer • Paired with many other kinds of reinforcers • e.g., money
Strength of Operant Learning • Can condition practically any behaviour • Shaping (successive approximations)
Shaping a Lever Press • Gradual process • Reinforce more appropriate/precise responses • Feedback
Response Chains • Sequences of behaviours in specific order • Objective: primary reinforcer • Conditioned reinforcers • Discriminative stimuli
Forward Chaining • Start with first response in sequence, then work through to last response in additive steps
Backwards Chaining • Often used with “complex” training • Start with last response in chain • Next, second last response • Third last, etc.
Contingency • Correlation between behaviour & outcome • Strong contingency --> better learning • Random contingency --> no learning • Both reinforcement and punishment
Contiguity • Time between behaviour & outcome • Shorter = better learning • Delays let other behaviours occur, forgetting, extinction (behaviour w/o reinforcement) • Learning with delay if stimulus “placeholder” provided (conditioned reinforcer?) • More important for punishment
Reinforcer Characteristics • Larger reinforcers --> stronger learning • Not a linear effect • Qualitative differences in reinforcers and punishers • Species & individual differences • Intensity of punisher
Task Characteristics • Some tasks easier to learn than others • Species & individual differences • Innate and/or prior conditioning
Deprivation Levels • Generally, the greater the deprivation, the more effective the reinforcer • Reinforcers can satiate • Deprivation can provide motivation to engage in punishable behaviours
Extinction • Behavioural does not lead to same outcome • Response no longer produces same outcome • Extinction burst (with reinforcement) • Variability of behaviour • Aggression and frustration • Spontaneous recovery • Resurgence
Hull’s Drive Reduction Theory • Animals have motivational states (drives) • Necessary for survival • Reinforcers are things that reduce drives • Physiological value • Reduce physiological state
Drive Reduction Reinforcers • Works well with primary reinforcers • Many secondary reinforcers have no physiological value • Hull: association links secondary to drive • Some reinforcers hard to classify as primary or secondary • Some increase a physiological state • Some necessities undetectable • Roller coasters • Vitamins • Saccharin
Relative Value Theory & Premack Principle • Treat reinforcers as behaviours • Is it the food, or the behaviour of eating that is the reinforcer? • Behavioural probability scale • Greater or lesser value of behaviours relative to one another • No distinction between primary and secondary
Premack Principle • One behaviour will reinforce a second behaviour • High probability behaviour reinforces low probability behaviour • Baseline probability scale • Time • Rank order • Reinforcement relativity • No absolutes Time spent on response Total time Probabilty of response =
Example • Behaviours • Eat ice cream (I), play video game (V), read book (B) • Baseline (30 minutes) • Student 1: I (2min), V (8min), B (20min) • Scale: I -- V -- B • Student 2: I (8min), V (20min), B (2min) • Scale: B -- I -- V • Student 1: V reinforces I, B reinforces V & I • Student 2: I reinforces B, V reinforces I & B
Problems • Baseline phase • Fair rating? • How to compare very different behaviours • Time problems • What if time not important to behaviour? • Behaviour duration? • Length of baseline period?
Response Deprivation Theory • Deprived behaviours = reinforcing behaviours • Drop below baseline level of performance • Not relative frequency of one behaviour compared to another (i.e., Premack) • Level of deprivation for a behaviour • Praise? “Yes”?
Definitions • Escape • Get away from aversive stimulus that is in progress • Avoidance • Get away from aversive stimulus before it begins
Shuttle Box • Solomon & Wynne (1953) • Dogs • Chamber with barrier; Shock • Light off as signal
Two-Process Theory • Classical and operant conditioning • Shock = US • Fear/pain/jump/twitch/squeal = UR • Darkness = CS • Fear of dark = CR • Fear: heart rate, breathing, stomach cramps, etc. • Negative reinforcement • Removal of fear (CR) • Escape of CS, not avoidance of shock
Support for Two-Process Theory • Rescorla & LoLordo (1965) • Dog in shuttlebox • No signal • Response gives “safe time” • Pair tone with shock • Tone increases rate of response • CS can amplify avoidance • Conditioned inhibition can reduce avoidance
Problems with Two-Process Theory • Avoidance without observable fear • Heart rate • Not consistent • Fear diminishes with avoidance learning
Measuring Fear • Kamin, Brimer, and Black (1963) • Lever press ---> food • Auditory CS ---> avoidance in shuttle box until: 1, 3, 9, 27 avoidances in a row • CS in Skinner box; check for suppression of lever press
Responding 1 3 9 27 Avoidance responses Results • Fear decreases during extended avoidance training • But, avoidance still strong • Even low fear is enough?
successful avoidance # of US received trials Extinction in Avoidance Behaviour • Odd prediction from two-process theory • “Yo-yo” effect • Avoidance should toggle • But! Avoidance is extremely persistent
One-Process Theory • Classical conditioning component unnecessary • Avoidance, not fear reduction, is reinforcer • “Safety”
Sidman Avoidance Task • Free-operant avoidance • Can avoidance be learned if no warning CS? • Shock at random intervals • Response gives safe time • Extensive training --> learn avoidance • But, usually never perfect • High variability across subjects • Two-process theory suggests: • Time becomes a CS (time elicits fear)
Herrnstein & Hineline (1966) • Rapid and slow shock rate schedules • Lever press switches schedules • Shocks presented randomly, no signal • Responses give shock reduction • Reduction in shock is reinforcer
Learned Helplessness • Behaviour has no effect on situation • Generalizes • Laboratory • Give inescapable shocks • Shuttle box • Will not switch sides • Expectation that behaviour has no effect
Learned Helplessness in Humans • Depression • Situations beyond your control • Three dimensions • Situation: specific or global • Attribute: internal or external • Time: short-term or long-term