520 likes | 547 Views
Delve into the world of instrumental learning and operant reinforcement theories. Navigate through terms like stimulus, response, and outcomes, comparing classical and operant conditioning. Learn about trial and error learning, the Law of Effect, and B.F. Skinner's contributions to operant learning. Discover the principles of discrete trial and free operant procedures, and explore contingencies like positive reinforcement and punishment. Unravel the concepts of response chains, forward and backward chaining, and factors influencing operant learning such as contingency and contiguity. Dive into reinforcement characteristics, task complexity, deprivation levels, and extinction phenomena. Explore Hull's Drive Reduction Theory and understand how reinforcers reduce motivational states.
E N D
Chapters 5 Instrumental Learning & Operant Reinforcement
Operant Learning • Stimulus • Response • Outcome
Classical vs. Operant • Classical • Requires reflex action • Neutral stimulus associated with US • Outside of subject’s control • Operant • Strengthening/weakening of “voluntary” action • Subject responds or doesn’t • Can operate together
What’s in a Name? • Operant learning: subject operates on environment • Instrumental conditioning: subject is instrumental in obtaining outcome
Trial and Error Learning • E.L. Thorndike • Animal intelligence • Maze studies
Puzzle Box • Cats • Cage with mechanism to open door • Escape latency • Discrete trial procedure
Law of Effect • Any behaviour followed by an appetitive stimulus will increase in frequency
Terms • Operant (response): any behaviour that operates on the environment to produce an effect • Reinforcer: any event that increases the frequency of a behaviour • Punisher: any event that decreases the frequency of a behaviour
Operant Learning • B.F. Skinner • Operant chamber • Free operant procedure
Discrete Trial & Free Operant • Discrete • One trial at a time • “Apparatus” must be re-set • Measure some behaviour • e.g., mazes • Free • Operant can occur at any time • Operant can occur repeatedly • Response rate • e.g., operant chamber
Four Contingencies • Positive reinforcement • Negative reinforcement • Positive punishment • Negative punishment
Positive and Negative • Positive: presents some stimulus • Negative: removes some stimulus
Reinforcers and Punishers • Reinforcer: increases a behaviour • Punisher: decreases a behaviour
Response Rate: Increases Decreases Presented Response Causes Stimulus to Be: Removed Contingencies Positive Reinforcement Positive Punishment Lever press --> Food Lever press --> Shock Negative Reinforcement Negative Punishment Lever press --> Shock off Lever press --> Food removed
Types of Reinforcers • Primary • Not dependent on an association with other reinforcers • Secondary • Initially neutral stimulus • Paired with primary reinforcer • “Conditioned Reinforcer”
Secondary Reinforcers • “Bridging”, “clicker” • Secondary extinction without periodic pairings with primary • Generally weaker than primary • Generalized reinforcer • Paired with many other kinds of reinforcers • e.g., money
Strength of Operant Learning • Can condition practically any behaviour • Shaping (successive approximations)
Shaping a Lever Press • Gradual process • Reinforce more appropriate/precise responses • Feedback
Response Chains • Sequences of behaviours in specific order • Objective: primary reinforcer • Conditioned reinforcers • Discriminative stimuli
Forward Chaining • Start with first response in sequence, then work through to last response in additive steps
Backwards Chaining • Often used with “complex” training • Start with last response in chain • Next, second last response • Third last, etc.
Contingency • Correlation between behaviour & outcome • Strong contingency --> better learning • Random contingency --> no learning • Both reinforcement and punishment
Contiguity • Time between behaviour & outcome • Shorter = better learning • Delays let other behaviours occur, forgetting, extinction (behaviour w/o reinforcement) • Learning with delay if stimulus “placeholder” provided (conditioned reinforcer?) • More important for punishment
Reinforcer Characteristics • Larger reinforcers --> stronger learning • Not a linear effect • Qualitative differences in reinforcers and punishers • Species & individual differences • Intensity of punisher
Task Characteristics • Some tasks easier to learn than others • Species & individual differences • Innate and/or prior conditioning
Deprivation Levels • Generally, the greater the deprivation, the more effective the reinforcer • Reinforcers can satiate • Deprivation can provide motivation to engage in punishable behaviours
Extinction • Behavioural does not lead to same outcome • Response no longer produces same outcome • Extinction burst (with reinforcement) • Variability of behaviour • Aggression and frustration • Spontaneous recovery • Resurgence
Hull’s Drive Reduction Theory • Animals have motivational states (drives) • Necessary for survival • Reinforcers are things that reduce drives • Physiological value • Reduce physiological state
Drive Reduction Reinforcers • Works well with primary reinforcers • Many secondary reinforcers have no physiological value • Hull: association links secondary to drive • Some reinforcers hard to classify as primary or secondary • Some increase a physiological state • Some necessities undetectable • Roller coasters • Vitamins • Saccharin
Relative Value Theory & Premack Principle • Treat reinforcers as behaviours • Is it the food, or the behaviour of eating that is the reinforcer? • Behavioural probability scale • Greater or lesser value of behaviours relative to one another • No distinction between primary and secondary
Premack Principle • One behaviour will reinforce a second behaviour • High probability behaviour reinforces low probability behaviour • Baseline probability scale • Time • Rank order • Reinforcement relativity • No absolutes Time spent on response Total time Probabilty of response =
Example • Behaviours • Eat ice cream (I), play video game (V), read book (B) • Baseline (30 minutes) • Student 1: I (2min), V (8min), B (20min) • Scale: I -- V -- B • Student 2: I (8min), V (20min), B (2min) • Scale: B -- I -- V • Student 1: V reinforces I, B reinforces V & I • Student 2: I reinforces B, V reinforces I & B
Problems • Baseline phase • Fair rating? • How to compare very different behaviours • Time problems • What if time not important to behaviour? • Behaviour duration? • Length of baseline period?
Response Deprivation Theory • Deprived behaviours = reinforcing behaviours • Drop below baseline level of performance • Not relative frequency of one behaviour compared to another (i.e., Premack) • Level of deprivation for a behaviour • Praise? “Yes”?
Definitions • Escape • Get away from aversive stimulus that is in progress • Avoidance • Get away from aversive stimulus before it begins
Shuttle Box • Solomon & Wynne (1953) • Dogs • Chamber with barrier; Shock • Light off as signal
Two-Process Theory • Classical and operant conditioning • Shock = US • Fear/pain/jump/twitch/squeal = UR • Darkness = CS • Fear of dark = CR • Fear: heart rate, breathing, stomach cramps, etc. • Negative reinforcement • Removal of fear (CR) • Escape of CS, not avoidance of shock
Support for Two-Process Theory • Rescorla & LoLordo (1965) • Dog in shuttlebox • No signal • Response gives “safe time” • Pair tone with shock • Tone increases rate of response • CS can amplify avoidance • Conditioned inhibition can reduce avoidance
Problems with Two-Process Theory • Avoidance without observable fear • Heart rate • Not consistent • Fear diminishes with avoidance learning
Measuring Fear • Kamin, Brimer, and Black (1963) • Lever press ---> food • Auditory CS ---> avoidance in shuttle box until: 1, 3, 9, 27 avoidances in a row • CS in Skinner box; check for suppression of lever press
Responding 1 3 9 27 Avoidance responses Results • Fear decreases during extended avoidance training • But, avoidance still strong • Even low fear is enough?
successful avoidance # of US received trials Extinction in Avoidance Behaviour • Odd prediction from two-process theory • “Yo-yo” effect • Avoidance should toggle • But! Avoidance is extremely persistent
One-Process Theory • Classical conditioning component unnecessary • Avoidance, not fear reduction, is reinforcer • “Safety”
Sidman Avoidance Task • Free-operant avoidance • Can avoidance be learned if no warning CS? • Shock at random intervals • Response gives safe time • Extensive training --> learn avoidance • But, usually never perfect • High variability across subjects • Two-process theory suggests: • Time becomes a CS (time elicits fear)
Herrnstein & Hineline (1966) • Rapid and slow shock rate schedules • Lever press switches schedules • Shocks presented randomly, no signal • Responses give shock reduction • Reduction in shock is reinforcer
Learned Helplessness • Behaviour has no effect on situation • Generalizes • Laboratory • Give inescapable shocks • Shuttle box • Will not switch sides • Expectation that behaviour has no effect
Learned Helplessness in Humans • Depression • Situations beyond your control • Three dimensions • Situation: specific or global • Attribute: internal or external • Time: short-term or long-term