1 Agent Simulation of Hull s Drive Theory Nick Schmansky Department of Cognitive and Neural Systems Boston University March 7, 4 Abstract A computer simulation was conducted of an agent attempting to survive in a D environment, given existence of food, water and shock objects. The behavior of the agent was governed by Grossberg s  neural model of instrumental and classical conditioning that captures the concepts of Hull s drive theory . The drives of pain, hunger and thirst were simulated, and acted as reinforcement signals to successfully learn approach-avoidance behavior. Introduction In 943, Hull  proposed a theory of motivated behavior where internal drivestates direct action to reduce the level of the primary drive in the human or animal. Hull expressed his theory in the form of an equation, where the potential for reaction to a particular stimulus is a multiplicative law combining habits and drive. Reward and inhibition factors were added later, the result is equation (): SE R = D K H I () In this equation, the likelihood that a given behavior will occur is S E R. D represents drive level, K represents reward, H is the habit strength, and I represents inhibition. The important idea behind the equation is that all these factors must work synergistically in order for a behavior to have a high probability of being emitted, resulting in selecting the most appropriate behavior in a given situation, while suppressing less adequate behavior. A habit hierarchy emerges.  Later, Grossberg  proposed a neural model of instrumental and classical conditioning that captures the concepts of Hull s drive theory (among many other theories of motivated behavior). In the model, neurons simulate sensory stimuli, internal drive signals, and motor effectors. Reinforcement acts to focus
2 attention on important sensory stimuli, and learning is captured in the neuron connection weights. Drives and incentives combine in a multiplicative manner. Grossberg s model is schematized in figure. Each box represents a collection of neurons. Modifiable connections (subject to learning) are shown as filled half-circles. Of importance is the gating function (depicted as an AND gate symbol) whereby drive units, such as pain, hunger or thirst, control whether stimuli may pass-through to a modulation block (depicted as a filled-box with an arrow through it), which drives the behavior units. Notice that stimuli may take direct control over behavior, shown as the UCS UCR pathway. Also note that UCS are directed toward the drive units, which may pair with CS as the basis of the learning process. Conditioned Stimuli (CS) Gated Conditioned Stimuli Drive Units (Reward/Punishment) Unconditioned Stimuli (UCS) Behavior Generation (UCR/CR) Figure : Schematic of Grossberg s neural model of classical and operant conditioning To explore Hull s drive theory within the context of Grossberg s neural model, a computer simulation of an agent surviving in a D environment was conducted. Observations were made on whether the agent s behavior was properly modulated by it s internal drive states, and whether the agent could adapt, by learning associations, to the stimuli encountered in its environment. It is expected that the agent should learn to avoid shock (painful) stimuli, and to seek food and water (satiating) stimuli.
3 Methods The MATLAB toolkit was used to develop and execute the simulation. The simulated components included a bounded D arena (5x5 pixels in size) containing three object-types and the agent. A snapshot of the arena is shown in figure. The objects include a shock device (red Xs), food (green circles), and water (blue squares). A random number of each object-type is distributed on the arena at startup. To provide variability in the environment, at random intervals, the simulation will re-activate and move consumed objects (food and water), as well as moving some small number of shock objects. 5 Arena:: Agent + Food O Water [ ] Shock X Figure : Simulated agent and environment. The agent has a 6 o field of view. A water object is the closest object in this snapshot. The brain of the agent is a simple implementation of Grossberg s  neural model shown in figure. The factors of Hull s drive equation () are implicitly encoded in Grossberg s model and this implementation. A schematic of the implemented model is shown in figure 3. Referring to figure 3, the agent is endowed with vision receptors for red, green and blue stimuli. It s visual field (in it s direction of gaze) is 6 o wide (as shown in figure ). There are 5 neurons 3
4 associated with each color, each encoding 4 o of the visual field. The activity of the neuron correlates with range to the object. Thus, close objects produce high neural activity in the neuron upon which that object falls in the visual field. Distant objects induce lower activity. The activity within each color channel (a bank of 5 neurons per color) is normalized. A plot of receptor intensity level from a sample run is shown in figure 4. The center unit (#8 in figure 4) corresponds to the direction of gaze. Lower numbered units code a progressively left-ward field of view, and higher numbered units code a progressively rightward field of view. max channel selector three channel vision receptor drive learning units (weights) R G B pain visual field R G B R G B hunger hunger level R G B thirst max drive unit selector shock water food contact detector thirst level turn angle and forward motion control shock food water Figure 3: Implemented form of the Grossberg neural model The agent possesses an object contact and sense detector, thus, when it is within a pixel of an object, that object will induce an unconditioned stimulus (US). Red objects produce shock, inducing pain. Green objects represent food, inducing hunger satiation. Blue objects represent water, inducing thirst satiation. 4
5 .9 Red Green Blue RGB Receptors Intensity Visual Field Figure 4: Simulated visual field receptor. The center unit (8) corresponds to the direction of gaze, spreading outward right and left to the edge units, which detect the edge of the visual field. The field is 6 o wide. In this instance, the receptors indicate the objects detected in the arena shown in figure. The agent is endowed with three drives: pain, hunger and thirst. The pain drive is always present, that is, the agent always does not like pain (delivered as a shock in the environment). The hunger and thirst drives are initially zero, and increase monotonically. Once a threshold is reached on either drive, then the agent is motivated to behave. The hunger threshold is three times greater than the threshold for thirst. A graph of these drives from a sample run is shown in figure 5. The agent s innate behavior (akin to its unconditioned responses), is to seek the closest object in its field of view, and make a small random turn if no object is sighted. Upon contact with food, its hunger drive is reset to zero. Upon contact with water, its thirst drive is reset to zero. If a food or water object is consumed, the water or food object s color changes to yellow, indicating it is no longer a consumable (an arena mechanism will periodically re-activate this consumable). The agent will not consume food if it s hunger drive level is below threshold, nor will it consume water if it s thirst drive is below threshold. Upon contact with a shock object, the agent will make a sharp turn and move a few pixels away. In addition to the innate behavior, the agent contains modifiable weights which factor into the control the direction of gaze. There are three sets of weights, corresponding to pain, hunger and thirst. Each set is composed of three sets of 5 neurons, which map to the color channels from the vision receptor. Upon contact with an object, depending on the drive associated with the contact stimulus, the weights are increased by the intensity detected by the vision receptor. When shocked, the pain drive weights are modified. Having 5
6 4 3.5 Hunger Thirst Agent Drive Nodes Figure 5: Internal hunger and thirst drive states of the agent eaten, the hunger drive weights are modified. Having drank, the thirst drive weights are modified. The conditioned response learned for shock stimuli is to turn away when the stimulus found in the receptor at the time of the shock was detected. When food or water is contacted, the learned conditioned response is to turn toward whatever the receptor field detected at the time. Thus, classic CS-US pairing takes place, based on the color stimulus of the object in the visual field. This is the mechanism by which the agent is supposed to learn avoidance behavior of objects producing shock (irrespective of color), and approach behavior to food and hunger objects (again, irrespective of color). The agent initially has no knowledge of the content of each colored object. It must learn the meaning of the object through reinforced experience. In a direct implementation of Hull s theory, drives are nonspecific, that is, activation of the thirst drive may elicit approach of a food object. However, this agent s implementation allows for specific drive activation. Thus, if the agent is not hungry, then the hunger drive unit s learned cues are not activated. Observations during experimentation were made to determine whether this mechanism holds true. Lastly, at some predetermined timestep within a session (once it appears that a fair degree of learning has taken place) the simulation will perform a contingency change in the form of red objects delivering food, green objects water, and blue objects a shock. In this way, observations can be made on how the agent responds to this change in stimulus/response characteristics. 6
7 3 Results The simulation was successful in adequately modelling the innate tendency of the agent to move toward the closest object, and to make random turns if nothing was sighted. Movement was initiated only if the hunger or thirst drive had incremented to it s threshold. This behavior is not surprising, however, as it is reflected in straightforward coding within the agent. Learning, via repeated weight adjustment following contact with an object, proved to be adequate to teach the agent to turn away from red objects in future encounters. Conversely, for food and water objects, the agent learned to turn toward the proper stimulus color. Figure 6 depict the weight changes after timesteps, where initial weights are zero. The changes clearly indicate learning a tendency to direct toward the center (the middle units) given the appropriate colored stimulus. For each drive, the expected stimulus color corresponding to the object primarily affecting that drive has increased (red pain, green food, blue water). Two effects are evident in these plots. Firstly, the position of the stimulus at the time of learning is slightly off the center axis (node #8) in each case. This is due to the randomness of the approach angle to each object. Secondly, at the time of object contact (when stimulus learning takes place) random numbers of different object types are inevitably in the distance, and some small association value is often included. This is evident in each plot in figure 6. Along these same lines, an interesting observation made in some scenarios was that if a food or water object happened to be very close and within the field of view when contact was made with a red shock object, then the agent would strongly negatively associate green stimuli (food objects), or blue stimuli (water objects, as the case may be) with pain, and would avoid food or water objects (green or blue stimulus) in the future. Only by chance could this weighting be undone by positive pairings between food (water) and green (blue) stimulus. Another observation made was that, given a particular grouping of shock objects, upon learning to avoid such objects, the agent could get caught in a corner of the arena, or swing away from any food or water in its field of view if a shock object was also closely in view. Thus, the agent could be scared to the point of not being able eat or drink. Unexpected behavior emerged in the contingency change experiment. Recall that in this experiment, after a predetermined number of timesteps, in this case steps, the red objects contain food instead of shock, green objects deliver water instead of food, and blue objects deliver a shock instead of water. Thus, it would be expected that following the steps of learning (enough for the agent to learn to associate red with shock, green with food, and blue with water), that the agent would either go hungry because (red) food objects would be avoided, or that the incorrect associations would become quickly unlearned. In fact, what was observed was that indeed the agent went hungry, rarely touching a food object, due to avoiding the red food objects. Figure 7, a plot of the hunger and thirst drive levels over time, clearly demonstrates the lack of hunger drive fulfillment following the contingency change at timestep. 7
8 Figures 8, 9, and are plots of the changes in drive weights before and after the contingency change. In each, the top plot, taken at timestep, indicates that the agent has learned a normal color stimulus/drive reaction association. The bottom plots were taken at timestep, considered an adequate amount of time to retrain. The bottom plot in figure 8 reveals why the agent was unable to feed, as the learned red stimulus pattern is still fairly strong alongside the newly learned aversion to blue stimuli. Comparison of plots in figure 9 show that very little, if any, new learning has taken place to unlearn the former association of green with food. Figure reveals that some learning of green to water association has taken place, but tends to be off-axis (to the agents left). Notice the former association of blue to water is still present, thus the agent remains attracted to blue objects. Given this situation, it is doubtful that even given ample training time whether this agent could ever regain proper associations. 8
9 .5 Pain Drive Hunger Drive Thirst Drive Figure 6: controlling motor units associated with pain, hunger, and thirst drive units following timesteps of activity. For each drive, the expected stimulus color (shown as a solid line) corresponding to the object primarily affecting that drive has increased (red pain, green food, blue water). The middle units encode the agent s view directly in front of it. 9
10 4 3.5 Hunger Thirst Agent Drive Nodes Figure 7: Internal hunger and thirst drive states of the agent during the course of normal learning (up to timestep ) and following a contingency change (at timestep ). The contingency change causes the hunger drive to rarely achieve satiation.
11 5 5.9 Pain Drive Pain Drive Figure 8: Effect of contingency change on pain drive weights. The top plot is before the change, the bottom plot is after. The plots are shown properly scaled to clearly show the relative weight changes between objects. The plots show that the agent has learned the new blue-to-shock association, but retains a fairly strong aversion to red (formerly a shock object, now a necessary food object).
12 Hunger Drive Hunger Drive Figure 9: Effect of contingency change on hunger drive weights. The top plot is before the change, the bottom plot is after. The plots are shown properly scaled to clearly show the relative weight changes between objects. The plots show that the hunger drive retains the green-to-food association, whereas it should have learned a new red-to-food association.
13 5 5.9 Thirst Drive Thirst Drive Figure : Effect of contingency change on thirst drive weights. The top plot is before the change, the bottom plot is after. The plots are shown properly scaled to clearly show the relative weight changes between objects. The plots indicate an learning an association between green and water, but off-axis somewhat. This would have the effect of not undoing the old blue-to-water association, which accounts for the agent s retained attraction to blue (and subsequent shock). 3
14 4 Discussion It is difficult to say whether the simulation provided any new insight into whether Hull s drive theory, embedded in Grossberg s neural model, is a sufficient model of human or animal survival or learning function. The reason is that the simulation essentially encoded all the necessary behavior, where the adaptable weights were somewhat transparent in their operation. A model of this type is really more appropriate in mobile robotics (Chang ). Future work on this simulation could incorporate finer resolution across each component, thus allowing some possibility of making predictions, or observing ever more unusual behaviors. However, the danger is always present where incorporation of functionality may inadvertently incorporate the very features or predictions sought. References  C. Chang. Biomimetic robotics: Application of biological learning theories to mobile robot behaviors, 999.  C. Dorman and P. Gaudiano. Motivation. In M.A. Arbib, editor, Handbook of brain theory and neural networks. MIT Press, Cambridge, MA, 994.  S. Grossberg. A psychophysiological theory of reinforcement, drive, motivation and attention. Journal of Theoretical Neurobiology, :86 369, 98.  C. Hull. Principles of behavior,