1 / 52

Chapters 5

Chapters 5. Instrumental Learning & Operant Reinforcement. Operant Learning. Stimulus Response Outcome. Classical vs. Operant. Classical Requires reflex action Neutral stimulus associated with US Outside of subject’s control Operant Strengthening/weakening of “voluntary” action

waldo
Download Presentation

Chapters 5

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapters 5 Instrumental Learning & Operant Reinforcement

  2. Operant Learning • Stimulus • Response • Outcome

  3. Classical vs. Operant • Classical • Requires reflex action • Neutral stimulus associated with US • Outside of subject’s control • Operant • Strengthening/weakening of “voluntary” action • Subject responds or doesn’t • Can operate together

  4. What’s in a Name? • Operant learning: subject operates on environment • Instrumental conditioning: subject is instrumental in obtaining outcome

  5. Trial and Error Learning • E.L. Thorndike • Animal intelligence • Maze studies

  6. Puzzle Box • Cats • Cage with mechanism to open door • Escape latency • Discrete trial procedure

  7. Law of Effect • Any behaviour followed by an appetitive stimulus will increase in frequency

  8. Terms • Operant (response): any behaviour that operates on the environment to produce an effect • Reinforcer: any event that increases the frequency of a behaviour • Punisher: any event that decreases the frequency of a behaviour

  9. Operant Learning • B.F. Skinner • Operant chamber • Free operant procedure

  10. Discrete Trial & Free Operant • Discrete • One trial at a time • “Apparatus” must be re-set • Measure some behaviour • e.g., mazes • Free • Operant can occur at any time • Operant can occur repeatedly • Response rate • e.g., operant chamber

  11. Four Contingencies • Positive reinforcement • Negative reinforcement • Positive punishment • Negative punishment

  12. Positive and Negative • Positive: presents some stimulus • Negative: removes some stimulus

  13. Reinforcers and Punishers • Reinforcer: increases a behaviour • Punisher: decreases a behaviour

  14. Response Rate: Increases Decreases Presented Response Causes Stimulus to Be: Removed Contingencies Positive Reinforcement Positive Punishment Lever press --> Food Lever press --> Shock Negative Reinforcement Negative Punishment Lever press --> Shock off Lever press --> Food removed

  15. Types of Reinforcers • Primary • Not dependent on an association with other reinforcers • Secondary • Initially neutral stimulus • Paired with primary reinforcer • “Conditioned Reinforcer”

  16. Secondary Reinforcers • “Bridging”, “clicker” • Secondary extinction without periodic pairings with primary • Generally weaker than primary • Generalized reinforcer • Paired with many other kinds of reinforcers • e.g., money

  17. Strength of Operant Learning • Can condition practically any behaviour • Shaping (successive approximations)

  18. Shaping a Lever Press • Gradual process • Reinforce more appropriate/precise responses • Feedback

  19. Response Chains • Sequences of behaviours in specific order • Objective: primary reinforcer • Conditioned reinforcers • Discriminative stimuli

  20. Forward Chaining • Start with first response in sequence, then work through to last response in additive steps

  21. Backwards Chaining • Often used with “complex” training • Start with last response in chain • Next, second last response • Third last, etc.

  22. Factors in Operant Learning

  23. Contingency • Correlation between behaviour & outcome • Strong contingency --> better learning • Random contingency --> no learning • Both reinforcement and punishment

  24. Contiguity • Time between behaviour & outcome • Shorter = better learning • Delays let other behaviours occur, forgetting, extinction (behaviour w/o reinforcement) • Learning with delay if stimulus “placeholder” provided (conditioned reinforcer?) • More important for punishment

  25. Reinforcer Characteristics • Larger reinforcers --> stronger learning • Not a linear effect • Qualitative differences in reinforcers and punishers • Species & individual differences • Intensity of punisher

  26. Task Characteristics • Some tasks easier to learn than others • Species & individual differences • Innate and/or prior conditioning

  27. Deprivation Levels • Generally, the greater the deprivation, the more effective the reinforcer • Reinforcers can satiate • Deprivation can provide motivation to engage in punishable behaviours

  28. Extinction • Behavioural does not lead to same outcome • Response no longer produces same outcome • Extinction burst (with reinforcement) • Variability of behaviour • Aggression and frustration • Spontaneous recovery • Resurgence

  29. Theories of Reinforcement

  30. Hull’s Drive Reduction Theory • Animals have motivational states (drives) • Necessary for survival • Reinforcers are things that reduce drives • Physiological value • Reduce physiological state

  31. Drive Reduction Reinforcers • Works well with primary reinforcers • Many secondary reinforcers have no physiological value • Hull: association links secondary to drive • Some reinforcers hard to classify as primary or secondary • Some increase a physiological state • Some necessities undetectable • Roller coasters • Vitamins • Saccharin

  32. Relative Value Theory & Premack Principle • Treat reinforcers as behaviours • Is it the food, or the behaviour of eating that is the reinforcer? • Behavioural probability scale • Greater or lesser value of behaviours relative to one another • No distinction between primary and secondary

  33. Premack Principle • One behaviour will reinforce a second behaviour • High probability behaviour reinforces low probability behaviour • Baseline probability scale • Time • Rank order • Reinforcement relativity • No absolutes Time spent on response Total time Probabilty of response =

  34. Example • Behaviours • Eat ice cream (I), play video game (V), read book (B) • Baseline (30 minutes) • Student 1: I (2min), V (8min), B (20min) • Scale: I -- V -- B • Student 2: I (8min), V (20min), B (2min) • Scale: B -- I -- V • Student 1: V reinforces I, B reinforces V & I • Student 2: I reinforces B, V reinforces I & B

  35. Problems • Baseline phase • Fair rating? • How to compare very different behaviours • Time problems • What if time not important to behaviour? • Behaviour duration? • Length of baseline period?

  36. Response Deprivation Theory • Deprived behaviours = reinforcing behaviours • Drop below baseline level of performance • Not relative frequency of one behaviour compared to another (i.e., Premack) • Level of deprivation for a behaviour • Praise? “Yes”?

  37. Escape and Avoidance

  38. Definitions • Escape • Get away from aversive stimulus that is in progress • Avoidance • Get away from aversive stimulus before it begins

  39. Shuttle Box • Solomon & Wynne (1953) • Dogs • Chamber with barrier; Shock • Light off as signal

  40. Two-Process Theory • Classical and operant conditioning • Shock = US • Fear/pain/jump/twitch/squeal = UR • Darkness = CS • Fear of dark = CR • Fear: heart rate, breathing, stomach cramps, etc. • Negative reinforcement • Removal of fear (CR) • Escape of CS, not avoidance of shock

  41. Support for Two-Process Theory • Rescorla & LoLordo (1965) • Dog in shuttlebox • No signal • Response gives “safe time” • Pair tone with shock • Tone increases rate of response • CS can amplify avoidance • Conditioned inhibition can reduce avoidance

  42. Problems with Two-Process Theory • Avoidance without observable fear • Heart rate • Not consistent • Fear diminishes with avoidance learning

  43. Measuring Fear • Kamin, Brimer, and Black (1963) • Lever press ---> food • Auditory CS ---> avoidance in shuttle box until: 1, 3, 9, 27 avoidances in a row • CS in Skinner box; check for suppression of lever press

  44. Responding 1 3 9 27 Avoidance responses Results • Fear decreases during extended avoidance training • But, avoidance still strong • Even low fear is enough?

  45. successful avoidance # of US received trials Extinction in Avoidance Behaviour • Odd prediction from two-process theory • “Yo-yo” effect • Avoidance should toggle • But! Avoidance is extremely persistent

  46. One-Process Theory • Classical conditioning component unnecessary • Avoidance, not fear reduction, is reinforcer • “Safety”

  47. Sidman Avoidance Task • Free-operant avoidance • Can avoidance be learned if no warning CS? • Shock at random intervals • Response gives safe time • Extensive training --> learn avoidance • But, usually never perfect • High variability across subjects • Two-process theory suggests: • Time becomes a CS (time elicits fear)

  48. Herrnstein & Hineline (1966) • Rapid and slow shock rate schedules • Lever press switches schedules • Shocks presented randomly, no signal • Responses give shock reduction • Reduction in shock is reinforcer

  49. Learned Helplessness • Behaviour has no effect on situation • Generalizes • Laboratory • Give inescapable shocks • Shuttle box • Will not switch sides • Expectation that behaviour has no effect

  50. Learned Helplessness in Humans • Depression • Situations beyond your control • Three dimensions • Situation: specific or global • Attribute: internal or external • Time: short-term or long-term

More Related