The Quadrants Aren’t Squares Anymore

We know that all animals learn via the ABCs (Antecedent – Behavior – Consequence) thus determining the likelihood of this behavior happening again under this set of antecedents (this scenario). The learning quadrants as we know them neatly divide how animals learn into a set of four possible consequences. These consequences are simple, yet cover pretty much anything that can happen as a result of a chosen action – a stimulus is added or subtracted. If a stimulus the learner likes (appetitive) is added the behavior will likely (be reinforced) maintain or increase in frequency [Positive Reinforcement]. If a stimulus the learner dislikes (aversive) is added the behavior will likely (be punished) decrease in frequency [Positive Punishment]. If a stimulus the learner dislikes (aversive) is removed the behavior will likely (be reinforced) maintain or increase in frequency [Negative Reinforcement]. If a stimulus the learner likes (appetitive) is removed the behavior will likely (be punished) reduce in frequency [Negative Punishment].

These four neat little boxes cleanly wrap up the four possible consequences to any behavior. There is only one slight problem -it forgets to take into account the salience, the value, of the stimulus being added or subtracted. If I turned a comfortable room up only one degree, you may not even notice. While if I turned a room up 5-10 degrees, you will feel it! If you are satiated and I gave you a cookie crumb, you will likely not be terribly impressed, compared to the large cookie I am eating. The truth is, the value matters when discussing the impact a stimulus will have on behavior. Especially when we compare competing factors – if the room is already very, very hot and I turn it up another degree you may find that additional degree aversive, while before it didn’t matter. If you are starving, that crumb of a cookie may be extremely valuable, not better than the rest of the cookie, but this small portion suddenly has value.

We need to consider this when training our animals as well. Thinking about the strength of our reinforcers and punishers (if we chose to use them) and how strong they will be when competing stimuli are around. For example, my horses may comply with energy and enthusiasm working for hay pellets in the winter but not in the summer, why? Because in the summer we have grass at their feet that acts as a competing reinforcer. Why trot for some hay pellets when they could stay still for some grass? Not only is the nature of the sitmulus determined by the learner (whether it is appetitive or aversive) but so is the value. The learner may not find hay pellets more valuable than grass, but maybe mixing in some apple chunks or Delicious Horse Treats* may be enough to outweigh the competing motivator of the grass. Conditioning also comes into play when discussing value. My hay pellets may be of moderate value in the winter, but low value on grass, but if I’ve trained a behavior with a long and strong reinforcement history off grass, the behavior will be strongly conditioned and be more likely to happen when there is competition. This is why it’s vital to not only use appropriately matched reinforcers for the moment, but also maintain strong conditioning outside of the necessary times. This is why we spend a great deal of time practicing for veterinary procedures with high rates of reinforcement and high value reinforcers. This way the behavior will be strong enough to outweigh the aversive nature of the procedure.

Another thing the classic quadrants fail to take into account is that often when adding one thing we are subtracting another and vice versa.  Are we adding food or subtracting hunger? Are we adding pain or subtracting safety/wellbeing? Are we adding heat or removing cold? Are we adding water or subtracting thirst? But remember how value fluctuates? Inherent value of the resource is fluctuated by how readily available that resource is and how the learner is currently feeling. If they’re hungry, food will carry more value than water. While if they’re dehydrated, the values may switch. While if water is only available a few minutes a day, they may drink even if they aren’t very thirsty – because they may not get water again soon. If food is available 24/7 it will reduce in value, they can eat whenever they like. If you’re starving even the crumb of a cookie would be found as very valuable, but if you’re satiated, just a crumb may not be terribly enticing. If you’re stuffed full, a whole cookie may not even hold much value – but this depends too on the learner, I can always eat more ice cream!!

This updated chart takes into account the value and strength of the stimuli added and subtracted, it also takes into account the the fact that these quadrants are tied together. When adding one thing we are subtracting another and vice versa.

So we want to look at the nature of the animal we are working with. Dogs tend to eat medium size, nutrient rich, meals only once or twice a day – meaning for training we need to divide their food into small quantities but high value. A pellet of kibble, a pinch of cheese, something small but rich. If we fed a cup of kibble each click the dog would likely reach satiation and the value of the food would decrease as their stomach size grew. This doesn’t make for practical training. Using food while training snakes for example, while we can divide a mouse into a few small bites, it’s natural for a snake to only eat one large meal every week or so. So we wouldn’t get very many clicks in before reaching satiation, we may want to look for another reinforcer – such as heat. While a cold snake may be willing to do anything for a warm rock to lay on, it wouldn’t be humane or ethical to let your snake be without heat. Yet again, we need to look at satiation level. A comfortable snake may still be happy to work for a warm rock, without depriving them of safety and comfort at first. This same concept applies to horses. We look at their nature, they spend their days working for huge quantities of low value food, we can match this in our training. Large quantities of low value reinforcers match what a horse is prepared for in nature – however if the horse feels as though they are starving it may be hard to find a low value reinforcer. Even if your horse is obese, they may feel as though they are starving if they have gone more than a few hours without food – because remember they are designed to consume a lot of food over many hours, but low in nutrition. So it may be important to satiate your horse before beginning training. To lower the value of the food you are training with. We can also provide competition to help lower the value of the reinforcer we are using – like the warm snake working for more heat, we can have hay available while we work to reduce the value of our pelleted hay (which is usually only a little better than plain hay). Knowing there is another option can help reduce the value of what we’re using.

We also want to take conditioning into consideration. If a behavior has been strongly reinforced for a long time, it has a strong history, making it a higher value and higher probability of occurrence than a behavior that is newer or has not been reinforced much. Other stimuli can be conditioned as well. We tend to use primary reinforcers when training, food, water and other things the learner inherently needs to survive and thrive. We can also use secondary reinforcers, these are things conditioned to be good – scratches, praise, play, or a specific behavior that is highly conditioned. These secondary reinforcers tend to be lower value and heavily fluctate in value as compared to primary reinforcers which remain more stable and predictable. Which is why we tend to train with primary. This applies to aversives as well. A stronger aversive will be a more effective punisher or negative reinforcer. Primary punishers are things that threaten a horse’s safety, wellbeing, or access to necessary resources. But punishers can also be conditioned, a signal from a hand or rope can be conditioned to predict the natural aversive. Again these conditioned aversives need to be maintained just as conditioned appetitives (reinforcers).

We need to know how to effectively increase and decrease the value of our reinforcers to ensure the comfort, safety, and effectiveness of our training. If our horse is starving and we are using small quantities of high value food, we will likely have a horse who is very over-threshold and not able to think or focus on behavior, because they feel desperate. We need to lower that value to have a thinking learner. While if we are working with strong competition (grass) we may need to know how to increase the value of what we are using – larger quantities or tastier options.

Another thing this chart takes into consideration is that when a stimuli added or subtracted is of low enough value it will have little effect on the behavior. If there is no inherent value to the behavior, it can easily be extinguished or fall behind more salient behaviors. I wish we had another word for this concept, we call it extinction, when a behavior fades because the value of the stimulus added or subtracted is not strong enough to maintain or increase the frequency of the behavior. But this is more than just that. No behavior is ever truly extinct if the learner is still capable of it, it may appear again when other options fail or the learner is confused or desperate enough to give that old behavior another shot. We call this spontaneous re-occurrence. Sometimes a change in environment can reignite this previously lost behavior. Perhaps “lost” is a better word, for both interpretations. The behavior could lose out to a stronger behavior or the behavior may become lost in the environment.

An example of this may be a horse who kicks their stall door, in hopes of a food reinforcer. This behavior could become “lost” because the horse has been put on 24/7 turn out and their is no longer a door to kick. It may also “lose out” to a stronger behavior, being taught to station guarantees a food reinforcer with a higher rate of success than kicking. But it could re-occur if the winning behavior stops being as effective or if the turned out horse is put back in a stall. This becomes a competition of values. So while no behavior is ever truly extinct, its value can diminish to almost nothing.

This occurs even with punished behaviors. A behavior may have been strongly punished in the past – but if the value of the punisher decreases, the behavior may reappear. We see this often when a horse is sent to a strong and harsh trainer, using valuable punishers, but then when they are returned to their kindly owner who only uses mild punishers, the behaviors re-occur. Showing the reinforcement value outweighs the punishing value of that behavior. This happens alot with behaviors that are self-reinforcing. These are behaviors that are reinforced without our interference. This can be pawing feels good to a frustrated learner (I have terrible restless leg – I think I would definitely be a pawer if I were a horse!). Pinning ears works all day to provide safety and space from other horses and animals, so why not try on humans? Bucking may effectively remove the annoyance of a rider. Breaking the stall guard or door may lead to earning food and mental enrichment. While annoying for us, these behaviors work for the learner. Remember animals don’t do behaviors because they believe they are “right” or “wrong”, they chose behaviors based on what “works” or “don’t works”. So they may know that breaking a stall guard doesn’t “work” when a human is there to provide a punisher, but it does work when there is no human around. This is not being sneaky or fresh, but effective. Behaviors only fade when they are ineffective, so the value of the reinforcers needs to be low. Think of it as a cost/benefit analysis of behaviors.

This being said we also have to consider extinction bursts. This happens when a behavior has a strong reinforcement history but is now not being reinforced or is being punished. The learner will often exaggerate the behavior, trying it bigger, better, or more often, before the behavior begins to fade. The behavior has worked in the past, so rather than throwing it away because it’s no longer working, they will try to see what they may be doing wrong, trying close approximations to that previously working behavior or amplified versions. If pawing wasn’t enough, maybe kicking will be? If nibbling wasn’t enough, maybe biting will be? It’s not “bad”, it’s just an attempt to make the behavior work again. We do this as well. Ever get a stuck key on your key board? You don’t click it once, it doesn’t work, so you never use that letter again. You will likely hit it again, hit it harder, hit it repeatedly, even pop the key off to clean out under it and try again! This behavior works to get the desired result, if it stops working, you try to fix it, you don’t just give up right away. But if all that stops working, and maybe you’ve made a new button to do that job, you create a new habit. I have one friend who has been using 8 instead of B for years now because of one faulty computer!