740 likes | 1.14k Views
Operant (Instrumental) Learning. StimulusResponseOutcome. Classical vs. Operant. ClassicalReflex actionNeutral stimulus associated with USOutside of subject's controlOperantStrengthens/weakens voluntary" actionSubject does/doesn't respondCan occur together. Edward Thorndike. Animal intelli
E N D
1. Chapters 5 and 7 Operant Learning
2. Operant (Instrumental) Learning Stimulus
Response
Outcome
3. Classical vs. Operant Classical
Reflex action
Neutral stimulus associated with US
Outside of subject’s control
Operant
Strengthens/weakens “voluntary” action
Subject does/doesn’t respond
Can occur together
4. Edward Thorndike Animal intelligence
Comparative psychology
5. Experiments Chicks, cats, dogs
Single animals
Observational learning
6. Puzzle Box
7. Trial-and-Error
8. Law of Effect "When particular stimulus-response sequences are followed by pleasure, those responses tend to be ‘stamped in’; responses followed by pain tend to be ‘stamped out’.” (Thorndike 1911)
Reinforced
Punished
9. Methodology Subjects
Apparatus
Escape latency
Time-curves
11. Theory Incremental learning
S-R
Direct experience
12. Revision Scientific method
Observational learning in non-humans
14. B.F. Skinner Operant response
The unit of behaviour
Effect it has on environment
Skinner’s approach ( video)
Operant chamber (video)
15. Discrete Trial & Free Operant Discrete
One trial at a time
Re-set apparatus
Measure a behaviour
Latency, running speed, reduction in errors
E.g., maze Free
Automatic repeat
Less disruptive for subject
Response rate
E.g., operant chamber
16. Three-Term Contingency Contingency: Y iff X
1. Discriminative stimulus (SD)
2. Operant response (R)
3. Outcome (O)
Appetitive or aversive
17. Outcomes and Effects Positive
Something is delivered
Negative
Something is removed
Reinforcer
Causes behaviour to increase
Punisher
Causes behaviour to decrease
Effect on behaviour re: “reinforcer” or “punisher”
18. Four Basic Operant Relations
19. Types of Reinforcers Primary
Not dependent on an association with other reinforcers
Secondary (“Conditioned Reinforcer”)
Neutral stimulus paired with primary reinforcer
20. Secondary Reinforcers “Bridging”, “clicker”
Secondary extinction without periodic pairings with primary
Generally weaker than primary
Less prone to satiation
Generalized reinforcer
Paired with many other kinds of reinforcers
21. Neurobiology of Reinforcement Pleasure centres of brain (reward pathway)
Electrical stimulation of brain (ESB)
Dopamine
Major neurotransmitter
Released by appetitive stimuli
22. Dopamine Release Different amounts of dopamine released
Unexpected reinforcement --> more dopamine release
Decreasing learning curve
Rescorla-Wagner
Less “surprising” the more you’ve learned; less dopamine released; less reinforcing
23. Addictive Internal/external drugs
Orgasm, cocaine, crack
Dopamine very addictive
Dopamine converts to epinephrine (adrenaline)
“Thrill junkies”
Tolerance develops
24. Strength of Operant Learning Condition practically any behaviour
Shaping (successive approximations)
25. Shaping a Lever Press Gradual process
Reinforce more appropriate/precise responses
Feedback
26. Response Chains Sequences of behaviours in specific order
Objective: primary reinforcer
Conditioned reinforcers
Discriminative stimuli
27. Backwards Chaining Often used with “complex” training
Start with last response in chain
Next, second last response
Third last, etc.
28. Chaining
29. Forward Chaining Start with first response
Add additional links in chain
30. Factors in Operant Learning
31. Contiguity Time between behaviour & outcome
Delays let other behaviours occur, forgetting, extinction (behaviour w/o reinforcement)
Learning with delay if stimulus “placeholder” provided (conditioned reinforcer?)
Important re: punishment
32. Contingency Correlation between behaviour & outcome
Strong vs. random contingency
Both reinforcement and punishment
33. Outcome Characteristics Larger reinforcers/punishers --> stronger learning
Not a linear effect
Qualitative differences in reinforcers and punishers
Species & individual differences
Intensity of punisher
Tolerance
34. Task Characteristics Some tasks easier to learn than others
Species & individual differences
Innate and/or prior conditioning
35. Deprivation Levels Generally, the greater the deprivation, the more effective the reinforcer
Reinforcer satiation
Deprivation can motivate punishable responses
36. Reinforcers in Punishment What maintains undesired behaviour?
Benefit?
Alternative sources of reinforcement
Find other ways to provide acceptable reinforcement
37. Latent Learning Motivation
Learning behaviour
Performing behaviour
38. Tolman & Honzig (1930)
39. Extinction Response no longer produces same outcome
Extinction burst
Variability of behaviour
Aggression and frustration
Spontaneous recovery
40. Behaviour Modification Also “behaviour analysis”
Alter behaviour via operant conditioning
Therapy
Reinforcement vs. punishment
41. Problems with Punishment in Behaviour Modification Application of the punisher
Incorrect use of punishment
Creates issues or exacerbates punishment consequences
Tolerance
Start with strong punisher
Gradually reduce
General reluctance to administer
42. Possible Consequences of Punishment Escape
Aggression, violence
At punisher, self, other
Apathy
General suppression of other behaviours
Abuse
Permanent damage
Imitation
43. Alternatives to Using Punishment
44. Response Prevention Make it impossible to do punishable behaviour
Circumvention
Younger children
45. Extinction Identify reinforcer of behaviour
Withhold reinforcer
Difficult to ID reinforcer
Extinction bursts
Slow
46. Differential Reinforcement Differential reinforcement of low responses (DRL)
Only reinforce behaviour when response occurs at low frequency
Differential reinforcement of zero responses (DR0)
Reinforcement contingent on not performing behaviour at all (in some time period)
47. Differential reinforcement of alternative behaviour (DRA)
Reinforcer gained from undesired behaviour now only available when some alternative behaviour done
Differential reinforcement of incompatible behaviour (DRI)
Reinforce behaviour completely incompatible with undesired response
48. Noncontingent Reinforcement Provide desired reinforcer on regular basis regardless of what is being done
No correlation between response and outcome
May work because subject gets reinforcer for “free”
Problems if reinforcer comes after some other undesired behaviour (new acquisition)
49. Negative Punishment Removal of pleasant stimulus
Time-out
Popular in human behaviour modification
50. Other Techniques for Behavioural Deceleration Overcorrection
Repetitions of alternate, desired behaviour
Restitution
Positive practice
Technically, punishment
Stimulus satiation
51. Escape and Avoidance
52. Definitions Escape
Get away from aversive stimulus that is in progress
Avoidance
Get away from aversive stimulus before it begins
53. Shuttle Box Solomon & Wynne (1953)
Dogs
Chamber with barrier; Shock
Light off as signal
54. Theory Issues For escape, no ambiguity
Aversive removed, behaviour increases = negative reinforcement
What about avoidance?
Shuttles before shock
Behaviour increases
Nothing obvious removed or delivered
Mowrer & Lamoreaux (1942)
“…not getting something can hardly, in and of itself, qualify as rewarding.”
55. Two-Process Theory Classical and operant conditioning
Shock = US
Fear/pain/jump/twitch/squeal = UR
Darkness = CS
Fear of dark = CR
Fear: heart rate, breathing, stomach cramps, etc. Negative reinforcement
Removal of fear (CR)
Escape from CS, not avoidance of shock
Two-process treats avoidance as just another type of escape behaviour
56. Support for Two-Process Theory Rescorla & LoLordo (1965)
Dog in shuttlebox
No signal
Response gives “safe time”
Pair tone with shock
Tone increases rate of response
CS can amplify avoidance
Conditioned inhibition can reduce avoidance
57. Problems with Two-Process Theory Avoidance without observable fear
Heart rate
Not consistent
Fear diminishes with avoidance learning
58. Measuring Fear Kamin, Brimer, and Black (1963)
Lever press ---> food
Auditory CS ---> avoidance in shuttle box until: 1, 3, 9, 27 avoidances in a row
CS in operant chamber; check for suppression of lever press
59. Results Fear decreases during extended avoidance training
But, avoidance still strong
Even low fear is enough?
60. Extinction in Avoidance Behaviour Odd prediction from two-process theory
“Yo-yo” effect
Avoidance should toggle
But! Avoidance is extremely persistent
61. One-Process Theory Classical conditioning component unnecessary
Two interpretations of reinforcer
Molar vs. molecular
Negative reinforcement: Overall reduction in exposure to punishers is reinforcer (text interpretation)
Postive reinforcement: Avoidance itself is reinforcer; subject gets reinforced by “safety” on a trial
62. Sidman Avoidance Task Free-operant avoidance
Can avoidance be learned if no warning CS?
Shock at random intervals
Response gives safe time
Extensive training --> learn avoidance
But, usually never perfect
High variability across subjects
Two-process theory suggests:
Time becomes a CS (time elicits fear)
63. Herrnstein & Hineline (1966) Rapid and slow shock rate schedules
Response switches schedules
Shocks presented randomly, no signal
Responses give shock reduction
Reduction in shock frequency is reinforcer
64. Learned Helplessness Behaviour has no effect on situation
Generalizes
Laboratory
Give inescapable shocks
Shuttle box
Will not switch sides
Expectation that behaviour has no effect
65. Learned Helplessness in Humans Depression
Situations beyond your control
Three dimensions
Situation: specific or global
Attribute: internal or external
Time: short-term or long-term
66. Therapeutic Application Confidence building (“can not fail”)
Implementation issues
Tasks that can be successfully completed
Produces immunization
Escapable condition … inescapable condition
Learned helplessness less likely to develop
67. Theories of Operant Conditioning
68. Hull’s Drive Reduction Theory Animals have motivational states (drives)
Necessary for survival
Reinforcers are things that reduce drives
Physiological value
Reduce physiological state
69. Drive Reduction Reinforcers Works well with primary reinforcers
Many secondary reinforcers have no physiological value
Hull: association links secondary to drive
Some reinforcers hard to classify as primary or secondary Some increase a physiological state
Some necessities undetectable
Roller coasters
Vitamins
Saccharin
70. Relative Value Theory & Premack Principle Treat reinforcers as behaviours
Is it the food, or the behaviour of eating that is the reinforcer?
Behavioural probability scale
Greater or lesser value of behaviours relative to one another
No distinction between primary and secondary
71. Premack Principle One behaviour will reinforce a second behaviour
High probability behaviour reinforces low probability behaviour
Baseline probability scale
Time
Rank order
Reinforcement relativity
No absolutes
72. Example Behaviours
Eat ice cream (I), play video game (V), read book (B)
Baseline (30 minutes)
Student 1: I (2min), V (8min), B (20min)
Scale: I -- V -- B
Student 2: I (8min), V (20min), B (2min)
Scale: B -- I -- V
Student 1: V reinforces I, B reinforces V & I
Student 2: I reinforces B, V reinforces I & B
73. Problems Baseline phase
Fair rating?
How to compare very different behaviours
Time problems
What if time not important to behaviour?
Behaviour duration?
Length of baseline period?
74. Response Deprivation Theory Deprived behaviours = reinforcing behaviours
Drop below baseline level of performance
Not relative frequency of one behaviour compared to another (i.e., Premack)
Level of deprivation for a behaviour