PSYC2011 Exam Notes. Instrumental conditioning



Similar documents
Chapter 7 Conditioning and Learning

Okami Study Guide: Chapter 7

Today. Learning. Learning. What is Learning? The Biological Basis. Hebbian Learning in Neurons

Okami Study Guide: Chapter 7

Programmed Learning Review

Classical vs. Operant Conditioning

Operant Conditioning. PSYCHOLOGY (8th Edition, in Modules) David Myers. Module 22

IMPORTANT BEHAVIOURISTIC THEORIES

Chapter 7 - Operant Conditioning. Lecture Outline

HONORS PSYCHOLOGY REVIEW QUESTIONS

GCSE PSYCHOLOGY UNIT 2 LEARNING REVISION

Chapter 5: Learning I. Introduction: What Is Learning? learning Conditioning II. Classical Conditioning: Associating Stimuli Ivan Pavlov

Learning from Experience. Definition of Learning. Psychological definition. Pavlov: Classical Conditioning

Learning. Relatively permanent behavior change that is acquired through experience

Outline. General Psychology PSYC 200. Definition. Habituation. Habituation. Classical Conditioning 3/17/2015. Learning

Behaviorism & Education

Introduction to Learning. Chapter 1

Chapter 5. Learning. Outline

A. Learning Process through which experience causes permanent change in knowledge or behavior.

Operant Conditioning. Skinner and Thorndike

7/17/2014. Applied Behavior Analysis (ABA) Therapy Overview. Applied Behavior Analysis Therapy. Ivan Petrovich Pavlov

LEARNING. Chapter 6 (Bernstein), pages

Classical Conditioning. Classical and Operant Conditioning. Basic effect. Classical Conditioning

Operant Conditioning: An Overview

UNIT 6: LEARNING. 6. When the US is presented prior to a neutral stimulus, conditioning DOES NOT (does/does not) occur.

Psychology with Mr. Duez UNIT 3 "Learning" LEARNING TARGETS

Chapter 8: Stimulus Control

Chapter 7. Behavioral Learning Theory: Operant Conditioning

GCSE Psychology Learning

Chapter 15. Historical Perspective. How the world creates who you are: behaviorism and social learning theory

Empirical Background for Skinner s Basic Arguments Regarding Selection by Consequences

9/14/2015. Innate behavior. Innate behavior. Stimuli that trigger innate behaviors are called releasers.

Behavioral Principles. S-R Learning. Pavlov & Classical Conditioning 12/2/2009

Agent Simulation of Hull s Drive Theory

Psychological Models of Abnormality

Learning. Any relatively permanent change in behavior brought about by experience or practice. Permanent Experience Practice

Learning Theories 4- Behaviorism

A Brief Explanation of Applied Behavior Analysis. conditioning to identify the contingencies affecting a student s behavior and the functions of the

Learning UNIT 6 UNIT PREVIEW UNIT GUIDE

Learning: Classical Conditioning

A BEHAVIORAL VIEW OF LEARNING

Classical (Pavlovian) Conditioning

A Behavioral Perspective of Childhood Trauma and Attachment Issues: Toward Alternative Treatment Approaches for Children with a History of Abuse

Behavioural Therapy A GUIDE TO COUNSELLING THERAPIES (DVD) Published by: J & S Garrett Pty Ltd ACN

Faulty Explanations for Behavior

Behaviorism: Laws of the Observable

Operant Conditioning

AP Psychology Academic Year

: " ; j t ;-..,-.: ',-. LEARNING AND MEMORY AN INTEGRATED APPROACH. Second Edition. John R. Anderson Carnegie Mellon University

Behavior Analysis and Strategy Application after Brain Injury: Addressing the long-term behavioral outcomes of brain injury

Classical Conditioning

Chapter 5. Chapter 5 Lectures Outline

Heather Maurin, MA, EdS, PPS, LEP, BICM School Psychologist-Stockton Unified School District THE ABC S OF APPLIED BEHAVIOR ANALYSIS

COURSE SYLLABUS. COURSE: EDP 7350 The Learning Process Section :001. 1:00 p.m. - 4:20 p.m., Monday and Wednesday

Behavior. Classical Conditioning Operant Conditioning Social Norms Cognitive Dissonance Stages of Change

Making Sense of Animal Conditioning

How do we Learn? How do you know you ve learned something? CLASS OBJECTIVES: What is learning? What is Classical Conditioning? Chapter 6 Learning

A View on Behaviorist Learning Theory. view of behaviorism assumes that all behavior is determined via the environment or how one has

Dimensions of ABA. Applied Behavior Analysis for Educational Settings. Underlying Assumptions of ABA

Effects of Reinforcement Schedules on Extinction Rate. Lauren Sniffen. Bloomsburg University of Pennsylvania

ABA & Teaching Methods

Pivotal Response Training: Parent Professional Collaboration

The ABC s of ABA. Claire Benson Kimberly Snyder Sarah Kroll Judy Aldridge

Learning Theories Taught in EDFL 2240: Educational Psychology. Behavioral Learning Theories (Learning is defined as a change in behavior)

Applied Behavior Analysis Reinforcement. Elisabeth (Lisa) Kinney, M.S. September 26, 2007

Post Traumatic Stress Disorder & Substance Misuse

FUNCTIONAL ASSESSMENT: HYPOTHESIZING PREDICTORS AND PURPOSES OF PROBLEM BEHAVIOR TO IMPROVE BEHAVIOR-CHANGE PLANS

Section 4 - Self-Directed Behavior Now that we have taken a look back at the roots of behaviorism and how it all started, let s take a look at how we

the Behavior Analyst Certification Board, Inc. All rights reserved.

Relational Frame Theory

Reinforcement and Its Educational Implications

Memory processes in classical conditioning

Practical Principles Using Applied Behavior Analysis

Psychology Ciccarelli and White

Chapter 12: Observational Learning. Lecture Outline

Edward C. Tolman. Edward C. Tolman. Edward C. Tolman. Chapter 12

PSYCHOTHERAPY. MODULE -V Social and Applied Psychology OBJECTIVES 24.1 MEDICAL MODEL. Psychotherapy. Notes

Encyclopedia of School Psychology Conditioning: Classical And Operant

Final Exam Review for EDP304 Prague

RESCORLA-WAGNER MODEL

Learning. Chapter 5. How have you used reinforcement to modify your own behavior or the behavior of others? Video 00:00 / 02:28

COMPUTATIONAL MODELS OF CLASSICAL CONDITIONING: A COMPARATIVE STUDY

4/25/2014. What is ABA? Do I use ABA? Should I use ABA?

Educational Psychology (EDP304) Comprehensive Course Review

Applied Behavior Analysis Reinforcement. Elisabeth (Lisa) Kinney, M.S. September 19, 2007

Learning is defined as a relatively permanent change in behavior that occurs as a result of experience.

COMPREHENSIVE EXAMS GUIDELINES MASTER S IN APPLIED BEHAVIOR ANALYSIS

Behavior & Sensory Strategies for Individuals with ASD

AMPHETAMINE AND COCAINE MECHANISMS AND HAZARDS

CBT Treatment. Obsessive Compulsive Disorder

Applied Behavior Analysis. Session 1: Course overview and basic concepts

5 Learning. Links to Learning Objectives. Enduring Issues. How is learning influenced by an organism s inborn characteristics?

Image Source: Markstivers.com

Three Theories of Individual Behavioral Decision-Making

Presents. Superstition in the Pigeon

CHAPTER 5 LESSON PLAN NOV 28-DEC 9, 2011 LEARNING Learning Objectives

Transcription:

PSYC2011 Exam Notes Instrumental conditioning Also called operant conditioning Response learning - Stimulus -> Response -> Outcome - Learning about the consequences of your actions, behaviour change Distinct from classical (Pavlovian) conditioning - Conditioned Stimulus (CS) -> Unconditioned Stimulus (US) - Response changes the outcome The subject s behaviour determines the presentation of outcomes only in instrumental conditioning Thorndike s Law of Effect If an animal behaves in a certain way and receives some form of satisfaction, they are more likely to behave in that way again in the same situation Behaviours which are closely followed by punishment are less likely to occur in the same situation Cat in the puzzle box - No insight or point where the cat realised that the lever needs to be pushed to escape - Trial and error led to success, the amount of time for trials diminished over time - Learning is a continuous process, it is incremental Response -> Satisfying outcome -> Increase response Response -> Frustrating outcome -> Decrease response Reinforcement Relation between some event (a reinforcer) and a preceding response increases the strength of the response Reinforcers are defined by their observed effect on behaviour and not by its subjective qualities Positive contingency: response results in outcome Negative contingency: response prevents outcome Positive reinforcement (reward): good outcome increases response Negative reinforcement (avoidance): removal of a bad outcome increases response Punishment: bad outcome decreases response Omission: removal of a good outcome decreases response Secondary reinforcement Previously neutral stimuli may acquire reinforcing properties - Reinforcement can transfer to other stimuli - e.g. lever retracting = food coming, sound of food dispenser, signal marking reinforcement (lights, etc.), other stimuli present in chamber (context) - These things are loosely associated with the delivery of reinforcement Most rewarding stimuli in our lives are secondary reinforcers Very useful in animal training (e.g. clicker training) - Immediate reinforcer the very second the animal performs the task, signals food is coming Factors affecting instrumental conditioning

Temporal contiguity: the amount of time between response and the delivery of the reinforcer - Strong temporal contiguity is when the reinforcer is delivered closer to the response = more effective conditioning - Memory decay over time? By the time the reinforcer is delivered, the memory of the response is weak; this leads to weaker conditioning - Interference from other events? Has done other things in the meantime, could be reinforcing one of the other actions instead of the desired action - Small/no interval produces stronger learning in (almost) all cases of instrumental and classical conditioning (exception: conditioned taste aversion [alcohol, chemotherapy drugs, etc.]) Contingency: describes the statistical action between the two events - Does performing the action lead to reinforcement? - Strong = response/reward, response/reward - Weak = response/reward/reward, response/reward/reward/reward - Response needs to be a necessary requirement for getting the reward to increase effectiveness of conditioning Shaping Problem: complex behaviours are unlikely to occur spontaneously Behaviour evolves through reinforcement of successive approximation of a desired response The term behaviour shaping popularised by behaviourists (especially Skinner) Can sometimes occur inadvertently (e.g. mother rewarding child s tantrum by comforting them) To be effective, behaviour shaping must adhere to the basic principles of reinforcement - Close temporal contiguity between response and reinforcement - Avoid giving spurious reinforcement, this degrades contingency - Avoid reinforcing the wrong behaviour, development of superstitious behaviour Response chaining Many complex behaviours can be thought of as a series of simple responses Response chaining involves shaping a sequence of responses - e.g. dancing, driving a manual - Sight of lever (stimulus) -> approach lever (response) -> feel of lever (s) -> press lever (r) -> sound of lever (s) -> approach magazine (r) -> food (s) -> leave magazine (r) Most effective way of doing this is to start with the last response in the chain and move backwards to the first response Schedules of reinforcement In animal training and real life, primary rewards are rarely guaranteed 100% of the time Partial reinforcement or secondary reinforcement - Often desirable for practical reasons - Produces slower but more persistent responding Fixed ratio (e.g. FR5: means reinforcement is delivered once every 5 responses) Fixed interval (e.g. FI5: means reinforcement is delivered on the first response after 5 seconds has elapsed since last reinforcement) Variable ratio (e.g. VR5: means reinforcement is delivered on average every 5 responses) Variable interval (e.g. VI5: means reinforcement is delivered on the first response after a variable time (mean = 5 seconds) has elapsed since last reinforcement)

Extinction Availability of reinforcement is removed - Zero contingency between response and reinforcer Established response tends to decline Observed in instrumental and classical conditioning Omission training works on a similar basis - The omission of an expected reward - Negative contingency between response and reinforcement - Negative punishment Partial reinforcement extinction effect Responding acquired with PRF persists when non-reinforced to a greater extent than CRF (continuous reinforcement) Partial reinforcement produces more persistent responding although relatively slow rate of response at the beginning The less reliably a response is reinforced, the more persistent it is during extinction Discriminative stimuli SD (or S+) vs. SΔ (or S-) - In the presence of SD, the response is reinforced - In the presence of SΔ, the response is not reinforced Reinforcement stamps in a connection between SD and response Thorndike - SD -> response -> reinforcement - Habit formation: the next time you see the SD you will elicit the response without deliberation Too simplistic in some cases? - Responding in presence of SD sensitive to value of reinforcement - SD and SΔ act to facilitate and inhibit the response-reinforcement association In experiments, discriminative stimuli are usually discrete events (lights, tones, etc.) But the following might also serve as SD/SΔ: - Contexts - Emotional/physiological states - The passage of time - The reinforcer itself The discrete trial is made up of: - The SD (instruction or stimulus given) - A response or prompt - Reinforcement or correction Example: explanation of PREE? CRF is very distinguishable from extinction whereas PRF is less so: - CRF -> extinction: response/reward, response/reward, response/nothing, response/nothing, etc. - PRF -> extinction: response/reward, response/nothing, response/nothing, response/reward, response/nothing, etc. - Much less noticeable shift in context CRF vs. extinction serve as distinguishable markers New learning facilitated by the different contexts (more effective discriminative stimuli)

Is extinction unlearning? Evidence for the original association re-emerges under some circumstances: - Spontaneous recovery: occurs if you finish extinction session, then start responding again as if you had never gone through extinction - Reinstatement: previously extinguished association returns after the unsignalled presentation of an unconditioned stimulus - Rapid reacquisition: acquiring response faster upon retraining, original learning still present? - Renewal: subtle change in context can renew the original response, extinction is context-specific? All of these effects point toward the context serving as a cue SD Context plays a critical role in extinction Extinction as new learning: - Inhibitory learning specific to the context in which extinction occurs? - Context acts as a discriminative stimulus? Stimulus control Discriminative stimuli control behaviour - Behaviour is observably different in the presence vs. the absence of a particular stimulus - Stimulus control is acquired through differential reinforcement A particular stimulus feature or stimulus dimension can control behaviour - Variations in response rate when the feature is manipulated (eg. colour, size, orientation) Generalisation If reinforcement is delivered in the presence of a stimulus (SD/S+), learning tends to generalise to similar stimuli Generalisation gradient (across a stimulus continuum): - The closer to the original stimulus, the more generalisation occurs - The less similar a stimulus is to what has been presented in the training, the less response you ll see Discrimination Discriminating between stimuli means behaving differently towards them Discrimination applies in cases where: - The stimuli are easy to tell apart (obviously different along some dimension, e.g. colour) - The stimuli are confusable (the difference between them is not obvious) Discrimination learning Generalisation as failure to discriminate? - Organism cannot discriminate (sensory limitation) - Organism doesn t discriminate (lack of stimulus control) Finer discriminations can be learned through reinforcement The content of what is learned is critical for generalisation and discrimination in similar situations Transposition: relational learning? e.g. Kohler (1918) - Trained chickens to peck at a darker stimulus for reward - Changed the colours to see which stimulus they would peck at

- Saw a preference for the darker stimulus when colours had been changed - Evidence of learning a relationship between two stimuli? Spence s theory Excitatory conditioning to SD/S+, generalises to similar values Inhibitory conditioning to SΔ/S-, generalises to similar values Spence (1936): gradient summation theory of discrimination learning Feature based conditioning can explain transposition Predicts that relational choices will have clear physical limitations Peak shift Displacement of the peak of the gradient away from S+ in the direction opposite S- Spence s theory provides an explanation Discrimination and categorisation Animals can learn to discriminate between complex stimuli, even on seemingly conceptual grounds - e.g. categorisation of complex scenes by pigeons - Pigeons conditioned using large set of stimuli - Often over diverse physical features (e.g. trees change with the seasons) - Perform above chance on new category members - Indicative of the formation of a prototype (a representation of the typical category member) Features common to one category are more strongly reinforced Features common to both categories are not as strongly reinforced What looks like the learning of a prototype or category might be learning about the features that category members share in common The formation of a concept? - The most common features (e.g. leg shapes) are most strongly reinforced, become best discriminative stimuli Motivation Conditioned behaviour: - Variable but - Persistent Deprivation and satiation: - Affect activity - Affect preferences What is the role of motivation in: - Instrumental conditioning? - Performing a conditioned response? Motivation and performance Internal states can affect performance of previously learned responses e.g. Frustration: a motivational response to the omission of an expected rewards Frustration can produce a paradoxical reward effect - Responding seemingly strengthened by the omission of a reward - This is temporary

Frustration in extinction? Omission of reward generates frustration, driving a brief spurt of activity (spontaneous recovery)? Explains the PREE: - Partial reinforcement = reinforcement in the presence of frustration - Responding more resilient to frustration than in CRF The role of motivation in learning Thorndike s Law of Effect Motivational properties of the reinforcer are critical for learning Satisfaction results in stimulus-response learning No learning without the reinforcing outcome Latent learning Tolman: - Maze learning with rats - Rats that received food at the end of the maze learned better, making less errors in the maze - After swapping the groups and providing food to rats who never had it before, their errors dropped dramatically whereas the group of rats who had food removed drastically showed more errors - Without food, no strong motivation to navigate the maze without making errors Learning occurs without reinforcement - Learning without behaviour (in the absence of reinforcement) (latent learning) - Reinforcement provides impetus to perform Circularity in the Law of Effect Skinner: - What is reinforcement? Increase in response when paired with a reinforcer - What is a reinforcer? Stimulus/event that causes reinforcement - Explanatory value = 0 Better definitions of reinforcement Hull: - Biological needs (e.g. for food, water, sleep, sex) motivate behaviour - Drives - Behaviour organised to satisfy needs (reduce drives): - Behaviour = habit x drive (in other words, learning x motivation) - Reinforcement = drive reduction - Reinforcer = a stimulus that reduces a drive Premack (1959): - Reinforcement involves behaviour of its own (e.g. consumption) - Reinforcement = increasing access to preferred behaviours - Providing the opportunity to perform a preferred behaviour (e.g. eating) The Premack Principle: (given sufficient freedom) what behaviour is an individual most likely to engage in? - High probability behaviour (more preferred) - Low probability behaviour (less preferred) - Relative behavioural property - Reinforcement depends on current preference of the individual (reinforcement is dynamic)

- According to this principle, some behaviour that happens reliably (or without interference by a researcher, e.g. a child watching TV), can be used as a reinforcer for a behaviour that occurs less reliably, (e.g. a child doing the dishes) Instrumental conditioning: what is learned? Stimulus-response theory (e.g. Thorndike, Hull) - Motivating outcome reinforces the stimulus-response association - Insensitive to changes in motivation for the outcome - Habitual - Strong links between habitual behaviour and automaticity - Habitual responses are not sensitive to motivational changes that are specific to the outcome - But they are sensitive to the general motivational state of the organism - A stimulus that elicits a habitual response primes us to respond in a certain way - There may be subtle biases in conscious decision and action that can be described as being habitual But discriminative stimuli influence motivational states e.g. Cigarette craving in smokers (Dar et al., 2010) - Craving going up toward the end of flights, knowing that they will be allowed to smoke soon increases the craving ratings - Lower rates at the beginning of flights as there is a lack of availability of the reward Two-process theory (stimulus-outcome learning) - As stimulus is associated with outcome, it elicits emotional state - Sensitive to central emotional states elicited by stimulus - Excitement or fear leads to the type of response given - Goal-directed action/behaviour Outcome devaluation A (negative) change in the motivational significance of the outcome (the reinforcer) Through pairing outcome with aversive outcome (e.g. poisoning), or through satiation (e.g. free feeding, long exposure) - Conditioned taste aversion, pairing with other negative events, satiation Used to determine whether a subject is capable of choosing action based on their current goals Sensitivity to current value of reward even though not experiencing the reward Need to retrieve from memory that you don t like that reward after devaluation and choose the alternative Stimulus activates knowledge of the devalued relationship (cognitive) Apparent in some animals and most humans Stimulus (response-outcome) learning Stimulus acts as an occasion setter - A stimulus that signifies that there is now a relationship between response and reinforcer - Different to having direct associations with the response or the outcome Sensitive to the specific appraisal of expected outcome: will outcome be satisfying? Goal-directed Punishment

A situation where responding decreases because of a contingency between the response and a bad outcome Involves the delivery of an aversive stimulus (shock, loud noise, physical action, physical irritation, reprimand, time-out [sensory deprivation], overcorrection [performing the errored action over and over again], monetary fines) Omission is also a form of punishment (negative punishment): performing the act responds in a lower probability of something nice happening, preventing yourself from receiving a reward (negative contingency) Punishment is contentious: - Is punishment cruel? Is it unnecessary? Physical punishment: - In schools - In public In contrast, exaggeration of the risk of aversive outcomes in media is rife: - Heightened perceived threat - Avoidance learning receives little attention Early studies If a response is met with a frustrating outcome, the response is diminished Thorndike, the negative Law of Effect - Dropped this from the law as he couldn t get it to work in the lab Punishment is ineffective? - Thorndike (with humans): the response is wrong - Skinner (with rats): response met with a slap on the paw - But: response met with an electric shock is very effective Factors affecting punishment Yerkes and Dodson (1908) - Rats need to learn to discriminate between two chambers - One of the chambers is electrified and will give an electric shock when the rat runs through it - Looked at the number of trials it takes before the rat learns this perfectly and doesn t make any errors - The stronger the shock, the faster the rat learns - In chambers where it is harder to discriminate, if the shock is strong it takes a while for the animal to learn as well an optimal point of learning Intensity determines effectiveness - Yerkes-Dodson law - Depends on difficulty - If you are teaching someone and they are making errors, if you are punishing them too severely this will make the performance worse rather than better Stimulus control - Reduction of response for SD but not SΔ Path dependence - Weaker -> stronger = ineffective (e.g. electric shock building up over time) - If you start with a strong shock and make it weaker over time, this is sufficient to sustain change in behaviour - Resistance/habituation

Delay - Shorter better than longer - Temporal contiguity Reinforcement schedule - CRF better than PRF for punishment - But what will happen in extinction? (Effect diminishes faster) Contingency between response and punishment Punishment and reinforcement Punishment of a reinforced response? - Trade-off between reward and aversive outcome Punishment affects responding to Interval and Ratio schedules differently - Steady rate vs. bouts of behaviour - Punishment can increase a reinforced response Availability of other responses - Must be alternative ways to achieve goal - Having alternative things to do increases efficacy of a punisher (even a very mild one) Punishment seeking behaviour - Brown et al. (1964): an animal model of masochistic behaviour? - e.g. Avoidance learning - Persistent, self-punitive - Vicious circle of behaviour Explaining effects of punishment The (negative) Law of Effect - Thorndike abandoned idea Premack principle still applicable: - If more-preferred behaviour leads to having to perform less-preferred behaviour, more-preferred behaviour would diminish Conditioned emotional response - Suppression through fear conditioning - Instrumental or classical? Avoidance learning - Learning of an incompatible (competing) response - Learning to perform in a certain way in order to avoid an aversive outcome - Unpleasant event avoided by performing alternative response Side effects Punishment seems to be effective but: - Neurotic symptoms - Aggression (elicited by pain, frustration, modelling of behaviour) - Fear/anxiety (response -> shock -> fear) - Fear conditioning not specific to the undesirable response (can relate to context, punisher, the whole situation, etc.) Fear conditioning Generalisation of fear

Little Albert - J. B. Watson - Fear of rat due to loud noise generalised to stuffed animals, coats, rabbits, etc. Alternatives to punishment Extinction - Undesirable behaviour -> nothing Differential reinforcement of other behaviours (DRO) - Other behaviour -> reward Effective punishment is Immediate Consistent Contingent on undesirable response Delivered under variety of conditions Sufficiently aversive from the outset Not too severe Delivered in the presence of alternative responses And (in the case of humans) accompanied by a rational explanation Instrumental avoidance Public advertising - e.g. From the RTA - Trying to get you to change your behaviour because of the treat of something bad happening - Bechterev: classical conditioning in humans? - Brogden et al. (1938): running/activity, motivated to continue running on the basis of an absent event (no electric shock) Avoidance learning Negative reinforcement Response is encouraged because a negative outcome is avoided Two types of response: - Escape (response - escape prevents shock), early in training - Avoidance (response avoids future shock), later in training - Signalled or discriminative avoidance: a signal present to let the participant know a shock is coming Problem No response -> shock Response -> nothing Avoidance involves something not happening How can this be considered reinforcing? - Learning about absent events? - Shock is not the only thing that doesn t happen