An Animal Trainer's Introduction
To
Operant and Classical Conditioning:
Part Two
This is possibly the easiest, most effective consequence
for a trainer to control (and easy to understand, too!). Positive reinforcement
means starting or adding Something Good, something the animal likes or enjoys.
Because the animal wants to gain that Good Thing again, it will repeat the behavior
that seems to cause that consequence.
Examples of positive reinforcement:
The dolphin gets a fish for doing a trick. The worker gets a paycheck for working.
The dog gets a piece of liver for returning when called. The cat gets comfort
for sleeping on the bed. The wolf gets a meal for hunting the deer. The child
gets dessert for eating her vegetables. The dog gets attention from his people
when he barks. The elephant seal gets a chance to mate for fighting off rivals.
The child gets ice cream for begging incessantly. The toddler gets picked up
and comforted for screaming. The dog gets to play in the park for pulling her
owner there. The snacker gets a candy bar for putting money in the machine.
Secondary positive reinforcers and Bridges
A primary positive reinforcer is something that the animal does not
have to learn to like. It comes naturally, no experience necessary. Primary
R+s usually include food, water, often include sex (the chance to mate), the
chance to engage in instinctive behaviors, and for social animals, the chance
to interact with others.
A secondary positive reinforcer is something that the animal has to
learn to like. The learning can be accomplished through Classical Conditioning
or through some other method. A paycheck is a secondary reinforcer - just try
writing a check to reward a young child for potty training!
Animal trainers will often create a special secondary reinforcer they call
a bridge. A bridge is a stimulus that has
been associated with a primary reinforcer through classical conditioning. This process creates a
conditioned positive reinforcer, often called
a conditioned reinforcer or CR for short. Animals that have learned a bridge react to
it almost as they would to the reward that follows (animals that have learned
what clicker training is all about may sometimes prefer the CR that tells them
they got it right over the actual "reward").
Schedules of Reinforcement, and Extinction
A schedule of reinforcement determines how often a behavior is going to result
in a reward. There are five kinds: fixed interval, variable interval, fixed
ratio, variable ratio, and random.
A fixed interval means that a reward will occur
after a fixed amount of time. For example, every five minutes. Paychecks work
on this schedule - every two weeks I got one. A parent can reward a toddler
for dry diapers every 30 minutes.
A variable interval schedule means that reinforcers
will be distributed after a varying amount of time. Sometimes it will be five
minutes, sometimes three, sometimes seven, sometimes one. My e-mail account
works on this system - at varying intervals I get new mail (for me, email is
generally a Good Thing!).
A fixed ratio means that if a behavior is performed
X number of times, there will be one reinforcement on the Xth performance. For
a fixed ratio of 1:3, every third behavior will be rewarded. This type of ratio
tends to lead to lousy performance with some animals and people, since they
know that the first two performances will not be rewarded, and the third one
will be no matter what. Some assembly-line production systems work on this schedule
- the worker gets paid for every 10 widgets she makes. A fixed ratio of 1:1
means that every correct performance of a behavior will be rewarded.
A variable ratio schedule means that reinforcers
are distributed based on the average number of correct behaviors. A variable
ratio of 1:3 means that on average, one out of every three behaviors
will be rewarded. It might be the first. It might be the third. It might even
be the fourth, as long as it averages out to one in three This is often referred
to as a variable schedule of reinforcement or VSR (in other words,
it's often assumed that when someone writes "VSR" they are referring to a variable
ratio schedule of reinforcement).
With a random schedule, there is no correlation
between the animal's behavior and the consequence. This is how Fate works.
If reinforcement fails to occur after a behavior that has been reinforced in
the past, the behavior might extinguish. This process is called extinction. A variable ratio schedule of reinforcement
makes the behavior less vulnerable to extinction. If you're not expecting to
gain a reward every time you accomplish a behavior, you are not likely to stop
the first few times your action fails to generate the desired consequence. This
is the principle that slot machines are based on. "OK, I didn't win this time,
but next time I'm almost sure to win!"
When a behavior that has been strongly reinforced in the past no longer gains
a reinforcement, you might experience what's call an extinction burst. This is when the animal performs the
behavior over and over again, in a burst of activity. Extinction bursts
are something for trainers to watch out for!
See some nice graphs of various schedules here (but
skip the table of "Outcomes of Conditioning" - it's misleading. This author
uses "positive" to mean both "added" and "nice" - confusing!)
Recently Bob Bailey has cautioned against
needlessly using variable schedules. Most useful behaviors, he points out, will
get some sort of reinforcement every time. You might not always click and
treat your dog for sitting on cue, but you will always reward it with some
recognition and praise ("Good dog!"). If there is some circumstances where you
will be unable to deliver any reinforcement (during a long sequence
of behaviors, or when the animal is out of contact), then you will need to build
a buffer against extinction with a VSR. Otherwise, don't bother.
Cautions in using positive reinforcement
The timing must be good. If the animal did a great "stay" and you reward after
the release, you are rewarding getting up.
The reward has to be sufficient to motivate a repetition. Mild praise won't
be enough for some animals. Others require the richest of food rewards, etc.
Reinforcements can become associated with the person giving them. If the animal
realizes that he can't get any rewards without you present, he will not be motivated
to act.
Animals can get sated with the reward you're offering when they've had enough,
and it will no longer be motivating.
Reinforcers increase behavior. If you don't want your animal actively trying
out new behaviors ("throwing behaviors at the trainer"), don't use positive
reinforcement. Use a positive reinforcement to train an animal to do
something.
Negative punishment is reducing behavior by taking away Something Good. If
the animal was enjoying or depending on Something Good she will work to avoid
it getting taken away. They are less likely to repeat a behavior that results
in the loss of a Good Thing. This type of consequence is a little harder to
control.
Examples
The child has his crayons taken away for fighting with his sister. The window
looking into the other monkey's enclosure is shut when the first monkey bites
the trainer. "This car isn't getting any closer to Disneyland while you kids
are fighting!" The dog is put on leash and taken from the park for coming to
the owner when the owner called (this causes the unintentional result of the
dog being less likely to respond to the recall). The teenager is grounded for
misbehavior. The dolphin trainer walks away with the fish bucket when the dolphin
acts aggressive. "I'm not talking to you after what you did!" Xena The Warrior
Princess cuts off the air of an opponent who refuses to tell her what she wants.
Secondary Negative Punishers
Trainers seldom go to the trouble of associating a particular cue with negative
punishment. It's sometimes called a "delta", from SD or discriminative
stimulus. Some dog owners make the mistake of calling their dogs in the park
and then using the negative punishment of taking the dog away from the fun.
"Fido, come!" then becomes a conditioned negative punisher. My mom conditioned
a similar CP- as "Time to go!".
Positive punishment is something that is applied to reduce a behavior. The
term "positive" often confuses people, because in common terms "positive" means
something good, upbeat, happy, pleasant, rewarding. Remember, this is technical
terminology we're using, though, so here "positive" means "added" or "started".
Also keep in mind that in these terms, it is not the animal that is
"punished" (treated badly to pay for some moral wrong), but the behavior
that is "punished" (in other words, reduced). Positive punishment,
when applied correctly, is the most effective way to stop unwanted behaviors.
Its main flaw is that it does not teach specific alternative behaviors.
Examples
Our society seems to have a great fondness for positive punishment, in spite
of all the problems associated with it (see below). The peeing on the rug (by
a puppy) is punished with a swat of the newspaper. A dog's barking is punished
with a startling squirt of citronella. The driver's speeding results in a ticket
and a fine. The baby's hand is burned when she touches the hot stove. Walking
straight through low doorways is punished with a bonk on the head. In all of
these cases, the consequence (the positive punishment) reduces the behavior's
future occurrences.
Secondary Positive Punishers
Because a positive punisher, like other consequences, must follow a behavior
immediately or be clearly connected to the behavior to be effective, a secondary
positive punisher is very important. (This is especially true if the punisher
is going to be something highly aversive or painful). Many dog trainers actively
condition the word "No!" with some punisher, to form an association between
the word and the consequence. The conditioned punisher
(CP+) is an important part of training with Operant Conditioning.
Cautions in using Positive Punishment
Behaviors are usually motivated by the expectation for some reward, and even
with a punishment, the motivation of the reward is often still there. For example,
a predator must face some considerable risk and pain in order to catch food.
A wild dog must run over rough ground and through bushes, and face the hooves,
claws, teeth, and/or horns of their prey animals. They might be painfully injured
in their pursuit. In spite of this, they continue to pursue prey. In this case,
the motivation and the reward far outweigh the punishments, even when they are
dramatic.
The timing of a positive punishment must be exquisite. It must correspond exactly
with the behavior for it to have an effect. (If a conditioned punisher is used,
the CP+ must occur precisely with the behavior). If you catch your dog chewing
on the furniture and you hit him when he comes to you, you are suppressing coming
to you. The dog will not make the connection between the punishment
and the chewing (no matter how much you point at the furniture).
The aversive must be sufficient to stop the behavior in its tracks - and must
be greater than the reward. The more experience the animal has with a rewarding
consequence for the behavior, the greater the aversive has to be to stop or
decrease the behavior. If you start with a small aversive (mild electric shock
or a stern talking-to) and build up to a greater one (strong shock or full-on
yelling), your trainee may become adjusted to the aversive and it will not have
any greater effect.
Punishments may become associated with the person supplying them.
The dog who was hit after chewing on the furniture may still chew on the furniture,
but he certainly won't do it when you're around!
Physical punishments can cause physical damage, and mental punishments can
cause mental damage. You should only apply as much of an aversive as it takes
to stop the behavior. If you find you have to apply a punishment more than three
times for one behavior, without any decrease in the behavior, you are not "reducing
the behavior", you are harassing (or abusing) the trainee.
Punishers suppress behaviors. Use positive punishment to train an
animal not to do something.
Negative reinforcement increases a behavior by ending or taking away Something
Bad or aversive. By making the animal's circumstances better, you are rewarding
it and increasing the likelihood that it will repeat the behavior that was occurring
when you ended the Bad Thing.
In order to use negative reinforcement, the trainer must be able to control
the Bad Thing that is being taken away. This often means that the trainer must
also apply the Bad Thing. And applying a Bad Thing might reduce whatever behavior
was going on when the Bad Thing was applied. And reducing a behavior by applying
a Bad Thing is positive punishment. So when you start your Bad Thing
that you're going to end as a negative reinforcer, you run the risk of punishing
some other behavior.
One of the major results of taking away Something Bad is often relief.
So another way to think of negative reinforcement is that you are providing
relief to the animal but of course, this makes it an example of positive
reinforcement - you are providing Something Good - relief. Confusing?
Examples
The choke collar is loosened when the dog moves closer to the trainer. The
ear pinch stops when the dog takes the dumbbell. The reins are loosened when
the horse slows down. The car buzzer turns off when you put on your seatbelt.
Dad continues driving towards Disneyland when the kids are quiet. "I'm not talking
to you until you apologize!" The hostage is released when the ransom is paid.
The torture is stopped when the victim confesses. "Why do I keep hitting my
head against the wall? 'Cause it feels so good when I stop!" The baby stops
crying when his mom feeds him.
Secondary Negative Reinforcers
Trainers seldom go to the trouble of associating a particular cue with negative
reinforcement. You can still go ahead and do it.
Internal Reinforcers and Punishers
Trainers can not control all reinforcers and punishers, unfortunately. There
are a number of environmental factors that are going to affect the animal's
behavior that you have no control over, but which will still be a significant
consequence for your trainee.
Some of these come from the animal's internal environment - their own reactions.
Relief from stress, pain, or boredom are common reinforcers and some "self-reinforcing"
behaviors are actually maintained because of this. Examples are a dog barking
because it relieves boredom, or a person chewing on her fingers or smoking a
cigarette because it relieves stress. Drivers speed because it is fun. Guilt
is an internal punisher that some people experience.
"No Reward Markers" and "Keep Going Signals"
There's actually a fifth possible consequence to any behavior: nothing. You
push the button and nothing happens. You raise your hand and the teacher doesn't
call on you. You get no response to your e-mail, your proposal, or your job
application. The question you then have is, did no one notice your behavior?
Or was it just not worthy of a reinforcement?
To differentiate between these two possibilities, a trainer can use a no
reward marker (NRM). The NRM tells the animal that its behavior
will not gain it a reinforcer. A lot of dog trainers use "Nope!" "Wrong!" "Uh-uh!"
or "Try again" as NRMs. For example, if you're teaching your dog to sit in response
to the cue "sit" (it's not as obvious to the dog as it is to you; after all,
dogs don't have the experience of verbal words being labels for actions), and
the dog lies down or barks, you can give a NRM. The purpose of the NRM is to
get the animal to try something different. It is not a conditioned punisher
and should not be used when the dog does something you don't want it to ever
do. It's for when a behavior might be correct in a different circumstance but
not in this one.
Some trainers also have developed a keep going signal
(KGS). This signal tells the animal that it's on the right track, that its behavior
is leading to something that will gain it a reinforcer. For example, if you're
teaching a dog to roll over and it will lie on its side, you can use a KGS to
tell it that it's close to a behavior that will get it a reward, but not there
yet. Read more on the KGS here.
Operant Conditioning works on all animals!