Operant Conditioning & Cognitive Learning by Dr. Ilija Gallego Traditional Learning Learning based on operating on the environment. Learning based on Stimulus > Response > Consequence Responses in Operant Conditioning Responses are external Responses are active Responses are goal oriented Responses are purposeful Responses are voluntary Responses operate on the environment Responses are made to gain a reward Responses are initially brand new to the learner and must be learned Early Work on Operant Conditioning B.F. Skinner used rats and pigeons in a specially designed Skinner Box where the animals could learn to press a bar or peck a disk for food. Thorndike developed the Law of Effect: Consequences predict future responses. Thorndike s Law of Effect If you do something (make a response) and the consequence is good, you re likely to make that response again. If you do something (make a response) and the consequence is bad, you re less likely to do it again. Skinner Box Stimulus= Bar to press for food Response= To press the bar Consequence= Pellet of food
Acquiring Operantly Conditioned Responses Learner must initially be taught to make the new, external, voluntary, goal-oriented response. Learner will be focused on the consequence. Learner will be active. Both simple and complex responses can be learned through operant conditioning. Response Acquisition: Learning a New Response Unique to operant conditioning Steps for teaching a learner a new response Wait for response to occur coincidentally Increase the learner s motivation Limit other possible actions Shaping: rewarding successive approximations behavior you are attempting to shape. to the Superstitious Behavior Misunderstanding which response is leading to a consequence Skinner s pigeons misunderstanding: Turning in a circle and pecking disk gets a pellet of food. Sport figure s misunderstanding: They must follow a certain ritual to perform well in a game. Extinction To eliminate an operantly conditioned response: Eliminate the consequence Unique phenomena of extinction in operant conditioning: Behavior/Response increases before it decreases Generalization Applying what you have learned to stimuli other than the one you learned on.
Examples: You can tell time on a clock other than the one you learned on. You can drive a car you have never driven. Generalization allows us to apply what we learn and interact easily with the environment. Discrimination Making a specific response to a particular stimulus. Examples: Discriminative Stimulus A stimulus that tells us Putting a key in a door and turning the key and knob to get in, but remembering that this particular door sticks, so you must force it open with a shove. Discriminating how to start one car from how one starts most cars when that car has an anti-theft device (you must turn on the lights before turning on the car) If a certain response will have the consequence we expect When a certain response is likely to have the consequence we expect Which response we should make to get the response we want (Traffic signal tells us to stop or go in order to get through the intersection safely.) Unique to operant conditioning because it helps the learner decide which response to make. Examples In classical conditioning the learner doesn t think about his/her responses they are involuntary. Light in the Skinner box tells the rat when to press the bar for food. Light off = no reward for response. Out of order sign tells you to put your money in a vending machine, because you will not get the consequence you want.
Reinforcement Any consequence that increase a response in the future. Primary & Secondary Reinforcement Primary Reinforcement: In and of itself reinforcing Examples: Attention, Fulfillment of a Drive Secondary Reinforcement: Used to get a more primary reinforcer Examples: Money, Academic grades Schedules of Reinforcement So far we have assumed that we are reinforcing our learner for every response s/he makes. This is called continuous reinforcement. We can also reinforce our learner for only some responses. This is called partial reinforcement. Ratio Schedules of Reinforcement The partial schedule we choose to use to reinforce our learner may be based on the number of responses the learner makes. This is call a ratio schedule of reinforcement. Example: We give a rat a pellet every time s/he presses the bar 5 times. Interval Schedules of Reinforcement The partial schedule we choose to use to reinforce our learner may require our learner to make one response within a certain period of time. This is call an interval schedule of reinforcement. Example: We give a rat a pellet for pressing the bar once within each 2 minute interval. Fixed and Variable Schedules Whether we are using a ratio or interval schedule to reinforce our learner, we can apply the schedule in a fixed or variable pattern.
In a fixed schedule the ratio or interval always stays the same. In a varied schedule the ratio or interval varies for each learning trial. Examples of the Schedules of Reinforcement Fixed Ratio: The reinforcement will be given for every set number of responses, and that number will stay the same for each trial. Getting paid by the unit: You are given a bonus for every 3 health club memberships you sell. More about Fixed Ratio Schedules: These schedules tend to make the learner have a high response rate and feel in control or their reinforcements. The learner knows that the harder s/he works, the more reinforcements s/he will get. Reinforcement is based on the learner s performance. Variable Ratio: The reinforcement will be given based on the number of responses, but the number of responses needed to get a reinforcer will change with each trial. Gambling on a slot machine. You win more for making more responses (putting more coins into the machine increases your chances of winning), but you don t know how many quarters will be required before you win. More about Variable Ratio Schedules These schedules tend to make the learner have a highest response rate of any other schedule. The learner knows that the harder s/he works, the more reinforcements s/he will get. Reinforcement is based on the learner s performance, but some responses are reinforced while others are not.
Fixed Interval: The reinforcement will be given for making one response (or minimal response) within a set period of time. Getting paid by the hour, week, month or year. You get paid for each hour of work, as long as you are making at least a minimal effort to work. More about Fixed Interval Schedules These schedules produce the lowest response rate of any of the other schedules. The learner knows that working harder will not lead to more reinforcement - so why work hard? Reinforcement does not come faster when the learner works faster. Reinforcement does not increase when the learner works harder. Variable Interval: The reinforcement will be given for making a minimal response within a varying period of time. Getting a pop quiz. You do not know when the quiz is coming, but you know to study for it so that you can be ready when it does come. More about Variable Interval Schedules The learner knows that working harder will not lead to more reinforcement - so why work hard? Reinforcement does not come faster when the learner works faster, and does not increase when the learner works harder. Why press an elevator button more than once when pressing it more (response) will not make the elevator arrive faster (consequence)? Learner does learn to make the response right after receiving a consequence to let the teacher know, I m ready for another reinforcer.
Determining the Effectiveness of the Schedules of Reinforcement Both ratio schedules are more effective (produce more responses from the learner) than either interval schedule. Variable schedules are more effective than fixed schedules Most effective schedules: variable ratio, then fixed ratio Least effect schedule is the fixed interval Punishment Any consequences that eliminates or decreases responses in the future. Factors Influencing the Effectiveness of Punishment Timing: Punishment should be given immediately following the response to be eliminated Intensity: Punishment should be intense something the learner really dislikes Consistency: Punishment should be given each time a response is made Undesirable Effects of Punishment Primarily motivates learner to avoid punishment. Behavior is suppressed but not eliminated. Learner does not unlearn the response. No alternative behavior is learned. May cause anger and aggression in learner. May cause learner to stop making attempts to perform well. Two Types of Learning Based on Punishment Escape Learning Learning to make a response that allows you to escape from a punishment that has already begun. Stimulus (punishment)> Response > Consequence (stop punishment)
Avoidance Learning Example: Dog learns to jump partition in cage to get away from electric shock. Learning to make a response that allows you to avoid being punished. Stimulus (signal of punishment)> Response > Consequence (avoid punishment) Example: Dog learns to jump partition in cage when it hears a bell. This allows the dog to avoid an electric shock that will soon follow the bell. Positive & Negative Reinforcement & Punishment Positive vs. Negative In learning, positive means adding something or giving something In learning, negative means taking something away or removing something Following a Response, Reinforcement increases the response in the future. Punishment decreases the response in the future. Positive Reinforcement A consequence that gives or adds something to a situation in order to make the response it followed likely to increase in the future. The learner makes a response, and something is given so they will tend to repeat that response. Examples: Giving praise, Giving a reward Negative Reinforcement A consequence that takes away something from a situation in order to make the response it followed likely to increase in the future. The learner makes a response, and something is taken away so they will tend to repeat that response.
Examples: Lifting a restriction; you may play after you do your homework, A s are exempt from the final; if you keep an A average, the final will be removed Positive Punishment A consequence that gives or adds something to a situation in order to make the response it followed likely to decrease in the future. The learner makes a response, and something is given so they will not tend to repeat that response. Examples: A fine is imposed, A spanking is given Negative Punishment A consequence that takes away something from a situation in order to make the response it followed likely to decrease in the future. The learner makes a response, and something is taken away so they will not tend to repeat that response. Examples: Taking away a privilege, Grounding a child from an activity they enjoy Summary of Operant Conditioning Responses learned are external, voluntary and goal-oriented Learning a brand new response. Association is made between a response and it s consequence. Learner is active and focused on the consequence. Law of effect: consequences received can predict which responses will be made in the future. Applications of Classical & Operant Conditioning Token Economy A secondary reinforcer (a token) is given for good behavior. Learner turns the token in for something more primary. Based on operant conditioning. Time Out (from positive reinforcement) When a learner misbehaves, his/her positive reinforcement is removed.
Must have been giving positive reinforcement, so that you can remove it when the learner misbehaves. Based on operant conditioning. Flooding (Exposure): Used to remove fears. The learner is flooded with whatever they fear. Based on classical conditioning. Systematic Desensitization Used to remove fears. A stimulus that causes fear is paired with relaxation Learner makes a hierarchy of steps involved in the fear. Each step of the hierarchy is paired with relaxation, first cognitively (covert desensitization), and then in real life (in vivo). Based on classical conditioning Cognitive Learning Latent Learning Tolman Hidden Learning We can know how to do something, yet not show that we know it (not make the Response) Observational Learning Bandura Learning from other s receive consequences Modeling or vicarious learning