1 / 74

Chapters 5 and 7

Operant (Instrumental) Learning. StimulusResponseOutcome. Classical vs. Operant. ClassicalReflex actionNeutral stimulus associated with USOutside of subject's controlOperantStrengthens/weakens voluntary" actionSubject does/doesn't respondCan occur together. Edward Thorndike. Animal intelli

reynold
Download Presentation

Chapters 5 and 7

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Chapters 5 and 7 Operant Learning

    2. Operant (Instrumental) Learning Stimulus Response Outcome

    3. Classical vs. Operant Classical Reflex action Neutral stimulus associated with US Outside of subject’s control Operant Strengthens/weakens “voluntary” action Subject does/doesn’t respond Can occur together

    4. Edward Thorndike Animal intelligence Comparative psychology

    5. Experiments Chicks, cats, dogs Single animals Observational learning

    6. Puzzle Box

    7. Trial-and-Error

    8. Law of Effect "When particular stimulus-response sequences are followed by pleasure, those responses tend to be ‘stamped in’; responses followed by pain tend to be ‘stamped out’.” (Thorndike 1911) Reinforced Punished

    9. Methodology Subjects Apparatus Escape latency Time-curves

    11. Theory Incremental learning S-R Direct experience

    12. Revision Scientific method Observational learning in non-humans

    14. B.F. Skinner Operant response The unit of behaviour Effect it has on environment Skinner’s approach ( video) Operant chamber (video)

    15. Discrete Trial & Free Operant Discrete One trial at a time Re-set apparatus Measure a behaviour Latency, running speed, reduction in errors E.g., maze Free Automatic repeat Less disruptive for subject Response rate E.g., operant chamber

    16. Three-Term Contingency Contingency: Y iff X 1. Discriminative stimulus (SD) 2. Operant response (R) 3. Outcome (O) Appetitive or aversive

    17. Outcomes and Effects Positive Something is delivered Negative Something is removed Reinforcer Causes behaviour to increase Punisher Causes behaviour to decrease Effect on behaviour re: “reinforcer” or “punisher”

    18. Four Basic Operant Relations

    19. Types of Reinforcers Primary Not dependent on an association with other reinforcers Secondary (“Conditioned Reinforcer”) Neutral stimulus paired with primary reinforcer

    20. Secondary Reinforcers “Bridging”, “clicker” Secondary extinction without periodic pairings with primary Generally weaker than primary Less prone to satiation Generalized reinforcer Paired with many other kinds of reinforcers

    21. Neurobiology of Reinforcement Pleasure centres of brain (reward pathway) Electrical stimulation of brain (ESB) Dopamine Major neurotransmitter Released by appetitive stimuli

    22. Dopamine Release Different amounts of dopamine released Unexpected reinforcement --> more dopamine release Decreasing learning curve Rescorla-Wagner Less “surprising” the more you’ve learned; less dopamine released; less reinforcing

    23. Addictive Internal/external drugs Orgasm, cocaine, crack Dopamine very addictive Dopamine converts to epinephrine (adrenaline) “Thrill junkies” Tolerance develops

    24. Strength of Operant Learning Condition practically any behaviour Shaping (successive approximations)

    25. Shaping a Lever Press Gradual process Reinforce more appropriate/precise responses Feedback

    26. Response Chains Sequences of behaviours in specific order Objective: primary reinforcer Conditioned reinforcers Discriminative stimuli

    27. Backwards Chaining Often used with “complex” training Start with last response in chain Next, second last response Third last, etc.

    28. Chaining

    29. Forward Chaining Start with first response Add additional links in chain

    30. Factors in Operant Learning

    31. Contiguity Time between behaviour & outcome Delays let other behaviours occur, forgetting, extinction (behaviour w/o reinforcement) Learning with delay if stimulus “placeholder” provided (conditioned reinforcer?) Important re: punishment

    32. Contingency Correlation between behaviour & outcome Strong vs. random contingency Both reinforcement and punishment

    33. Outcome Characteristics Larger reinforcers/punishers --> stronger learning Not a linear effect Qualitative differences in reinforcers and punishers Species & individual differences Intensity of punisher Tolerance

    34. Task Characteristics Some tasks easier to learn than others Species & individual differences Innate and/or prior conditioning

    35. Deprivation Levels Generally, the greater the deprivation, the more effective the reinforcer Reinforcer satiation Deprivation can motivate punishable responses

    36. Reinforcers in Punishment What maintains undesired behaviour? Benefit? Alternative sources of reinforcement Find other ways to provide acceptable reinforcement

    37. Latent Learning Motivation Learning behaviour Performing behaviour

    38. Tolman & Honzig (1930)

    39. Extinction Response no longer produces same outcome Extinction burst Variability of behaviour Aggression and frustration Spontaneous recovery

    40. Behaviour Modification Also “behaviour analysis” Alter behaviour via operant conditioning Therapy Reinforcement vs. punishment

    41. Problems with Punishment in Behaviour Modification Application of the punisher Incorrect use of punishment Creates issues or exacerbates punishment consequences Tolerance Start with strong punisher Gradually reduce General reluctance to administer

    42. Possible Consequences of Punishment Escape Aggression, violence At punisher, self, other Apathy General suppression of other behaviours Abuse Permanent damage Imitation

    43. Alternatives to Using Punishment

    44. Response Prevention Make it impossible to do punishable behaviour Circumvention Younger children

    45. Extinction Identify reinforcer of behaviour Withhold reinforcer Difficult to ID reinforcer Extinction bursts Slow

    46. Differential Reinforcement Differential reinforcement of low responses (DRL) Only reinforce behaviour when response occurs at low frequency Differential reinforcement of zero responses (DR0) Reinforcement contingent on not performing behaviour at all (in some time period)

    47. Differential reinforcement of alternative behaviour (DRA) Reinforcer gained from undesired behaviour now only available when some alternative behaviour done Differential reinforcement of incompatible behaviour (DRI) Reinforce behaviour completely incompatible with undesired response

    48. Noncontingent Reinforcement Provide desired reinforcer on regular basis regardless of what is being done No correlation between response and outcome May work because subject gets reinforcer for “free” Problems if reinforcer comes after some other undesired behaviour (new acquisition)

    49. Negative Punishment Removal of pleasant stimulus Time-out Popular in human behaviour modification

    50. Other Techniques for Behavioural Deceleration Overcorrection Repetitions of alternate, desired behaviour Restitution Positive practice Technically, punishment Stimulus satiation

    51. Escape and Avoidance

    52. Definitions Escape Get away from aversive stimulus that is in progress Avoidance Get away from aversive stimulus before it begins

    53. Shuttle Box Solomon & Wynne (1953) Dogs Chamber with barrier; Shock Light off as signal

    54. Theory Issues For escape, no ambiguity Aversive removed, behaviour increases = negative reinforcement What about avoidance? Shuttles before shock Behaviour increases Nothing obvious removed or delivered Mowrer & Lamoreaux (1942) “…not getting something can hardly, in and of itself, qualify as rewarding.”

    55. Two-Process Theory Classical and operant conditioning Shock = US Fear/pain/jump/twitch/squeal = UR Darkness = CS Fear of dark = CR Fear: heart rate, breathing, stomach cramps, etc. Negative reinforcement Removal of fear (CR) Escape from CS, not avoidance of shock Two-process treats avoidance as just another type of escape behaviour

    56. Support for Two-Process Theory Rescorla & LoLordo (1965) Dog in shuttlebox No signal Response gives “safe time” Pair tone with shock Tone increases rate of response CS can amplify avoidance Conditioned inhibition can reduce avoidance

    57. Problems with Two-Process Theory Avoidance without observable fear Heart rate Not consistent Fear diminishes with avoidance learning

    58. Measuring Fear Kamin, Brimer, and Black (1963) Lever press ---> food Auditory CS ---> avoidance in shuttle box until: 1, 3, 9, 27 avoidances in a row CS in operant chamber; check for suppression of lever press

    59. Results Fear decreases during extended avoidance training But, avoidance still strong Even low fear is enough?

    60. Extinction in Avoidance Behaviour Odd prediction from two-process theory “Yo-yo” effect Avoidance should toggle But! Avoidance is extremely persistent

    61. One-Process Theory Classical conditioning component unnecessary Two interpretations of reinforcer Molar vs. molecular Negative reinforcement: Overall reduction in exposure to punishers is reinforcer (text interpretation) Postive reinforcement: Avoidance itself is reinforcer; subject gets reinforced by “safety” on a trial

    62. Sidman Avoidance Task Free-operant avoidance Can avoidance be learned if no warning CS? Shock at random intervals Response gives safe time Extensive training --> learn avoidance But, usually never perfect High variability across subjects Two-process theory suggests: Time becomes a CS (time elicits fear)

    63. Herrnstein & Hineline (1966) Rapid and slow shock rate schedules Response switches schedules Shocks presented randomly, no signal Responses give shock reduction Reduction in shock frequency is reinforcer

    64. Learned Helplessness Behaviour has no effect on situation Generalizes Laboratory Give inescapable shocks Shuttle box Will not switch sides Expectation that behaviour has no effect

    65. Learned Helplessness in Humans Depression Situations beyond your control Three dimensions Situation: specific or global Attribute: internal or external Time: short-term or long-term

    66. Therapeutic Application Confidence building (“can not fail”) Implementation issues Tasks that can be successfully completed Produces immunization Escapable condition … inescapable condition Learned helplessness less likely to develop

    67. Theories of Operant Conditioning

    68. Hull’s Drive Reduction Theory Animals have motivational states (drives) Necessary for survival Reinforcers are things that reduce drives Physiological value Reduce physiological state

    69. Drive Reduction Reinforcers Works well with primary reinforcers Many secondary reinforcers have no physiological value Hull: association links secondary to drive Some reinforcers hard to classify as primary or secondary Some increase a physiological state Some necessities undetectable Roller coasters Vitamins Saccharin

    70. Relative Value Theory & Premack Principle Treat reinforcers as behaviours Is it the food, or the behaviour of eating that is the reinforcer? Behavioural probability scale Greater or lesser value of behaviours relative to one another No distinction between primary and secondary

    71. Premack Principle One behaviour will reinforce a second behaviour High probability behaviour reinforces low probability behaviour Baseline probability scale Time Rank order Reinforcement relativity No absolutes

    72. Example Behaviours Eat ice cream (I), play video game (V), read book (B) Baseline (30 minutes) Student 1: I (2min), V (8min), B (20min) Scale: I -- V -- B Student 2: I (8min), V (20min), B (2min) Scale: B -- I -- V Student 1: V reinforces I, B reinforces V & I Student 2: I reinforces B, V reinforces I & B

    73. Problems Baseline phase Fair rating? How to compare very different behaviours Time problems What if time not important to behaviour? Behaviour duration? Length of baseline period?

    74. Response Deprivation Theory Deprived behaviours = reinforcing behaviours Drop below baseline level of performance Not relative frequency of one behaviour compared to another (i.e., Premack) Level of deprivation for a behaviour

More Related