An Integrate-and-fire Model of Prefrontal Cortex Neuronal Activity during Performance of Goal-directed Decision Making

Size: px
Start display at page:

Download "An Integrate-and-fire Model of Prefrontal Cortex Neuronal Activity during Performance of Goal-directed Decision Making"

Transcription

1 Cerebrl Cortex doi: /cercor/bhi072 An Integrte-nd-fire Model of Prefrontl Cortex Neuronl Activity during Performnce of Gol-directed Decision Mking NOT FOR PUBLIC RELEASE Rndl A. Koene nd Michel E. Hsselmo Center for Memory nd Brin, Deprtment of Psychology nd Progrm in Neuroscience, Boston University, 64 Cummington Street, Boston, MA 02215, USA The orbitl frontl cortex ppers to be involved in lerning the rules of gol-directed behvior necessry to perform the correct ctions bsed on perception to ccomplish different tsks. The ctivity of orbitofrontl neurons chnges dependent upon the specific tsk or gol involved, but the functionl role of this ctivity in performnce of specific tsks hs not been fully determined. Here we present model of prefrontl cortex function using networks of integrte-nd-fire neurons rrnged in minicolumns. This network model forms ssocitions between representtions of sensory input nd motor ctions, nd uses these ssocitions to guide gol-directed behvior. The selection of gol-directed ctions involves convergence of the spred of ctivity from the gol representtion with the spred of ctivity from the current stte. This spiking network model provides biologicl implementtion of the ction selection process used in reinforcement lerning theory. The spiking ctivity shows properties similr to recordings of orbitofrontl neurons during tsk performnce. Keywords: lerning, minicolumns, orbitofrontl, reinforcement, selective ctivity Introduction The orbitofrontl cortex plys n importnt role in goldirected behvior (Wllis et l., 2001). Lesions of the orbitofrontl cortex impir the bility of nimls to lern which stimuli re ssocited with rewrd (Bechr et l., 1994, 1997; Frey nd Petrides, 1997; Miller nd Cohen, 2001; Pers et l., 2003; Izquierdo nd Murry, 2004). Recordings from orbitofrontl cortex neurons demonstrte tht spiking ctivity in response to sensory stimuli chnges dependent upon the ssocition of stimulus with rewrd in both non-humn primtes (Thorpe et l., 1983; Schultz et l., 2000) nd rts (Mulder et l., 2003; Schoenbum et l., 2003). The orbitofrontl cortex ppers to be prticulrly importnt when the genertion of specific ctions depends upon the context of prticulr sensory stimuli (Miller nd Cohen, 2001). Here we focus on behvior directed towrd specific gol; we do not yet del with decisions bout the reltive vlue of different gols (Blleine nd Dickinson, 1998; Trembly nd Schultz, 1999). Here we present computtionl model tht is pplicble to multiple regions of the prefrontl cortex (PFC), demonstrting how popultions of spiking neurons could medite gol-directed behvior. In prticulr, we demonstrte how representtions of specific motor ctions cn be used for gol-directed behvior in multiple different circumstnces, dependent upon the context of specific sensory stimuli. This modeling effectively simultes the behvior nd pttern of ctivity of orbitofrontl cortex neurons describedinnexperimentbyschultzet l. (2000) neurons tht show response to sensory stimuli, to rewrd nd to expecttion of rewrd. This tsk involves the differentil genertion of Go versus NoGo responses to rndomly presented visul cues. Recordings demonstrted tht some neurons in the orbitofrontl cortex do indeed fire selectively for the trnsition from one specific stte to nother. Schultz et l. (2000) identified these neurons, lbeling them s selective for the instruction tht initites specific tril, s well s predictive for specific ction. Previous models of frontl cortex function hve used neurons with sigmoid input--output functions which represent firing of popultions of neurons (Cohen nd Servn-Schreiber, 1992; O Reilly nd Munkt, 2000). In order to model the ptterns of spiking ctivity more directly during behviorl tsks, we use integrte-nd-fire neurons (Stein, 1967; Gerstner, 2002; Gerstner nd Kistler, 2002) with Hebbin spike-timing-dependent synptic plsticity (STDP) (Levy nd Stewrt, 1983). Integrtend-fire neurons simulte the membrne potentil response to the build-up of synptic input over time nd emit spike when the potentil crosses threshold. The model shows how integrte-nd-fire neurons cn perform the functions described in equtions for circuit model of the PFC (Hsselmo, 2005). The structure of the model ws motivted by ntomicl evidence suggesting the orgniztion of neurl circuits into minicolumns (Lund et l., 1993), cell ssemblies of highly interconnected neurons found in the PFC. In our model, different minicolumns responded to both sensory input nd motor ctions, consistent with evidence (Fuster, 1973, 2000; Fuster et l., 1982; Funhshi et l., 1989; Quintn nd Fuster, 1992) tht ctivity in the PFC represents two types of perception: (i) the perception of pst sensory stimuli vilble due to short-term buffers nd current sensory stimuli; nd (ii) the proprioceptive senstion nd prediction of motor ctions. The orgniztion into minicolumns ws motivted by evidence for strong excittory nd inhibitory connectivity within locl circuits of corticl neurons (Mountcstle, 1997; Lu bke nd von der Mlsburg, 2004). The rpid strengthening of ssocitions between sensory sttes, motor ctions nd rewrd is motivted by studies showing rpid chnges in functionl interctions between popultions of prefrontl neurons during lerning (Thorpe et l., 1983; Schoenbum et l., 2000; Mulder et l., 2003). The structure of this model closely resembles fetures of reinforcement lerning (Schultz et l., 1997; Sutton nd Brto, 1998), so we will commonly refer to sensory informtion from the environment s stte. We will refer to motor output s ctions nd to the desired gol s rewrd. However, this model does not focus on the temporl difference lerning rule (Sutton, 1988), rule tht uses the difference between successive outputs s error mesure. Insted it focuses on mechnisms of ction selection ssocited with specific sensory sttes nd rewrd. This demonstrtes how integrte-nd-fire neurons cn

2 Koene ½AQ1Š perform the circuit mechnism of ction selection proposed in more bstrct model of the PFC (Hsselmo, 2005). In the following sections we simulte the proposed mechnism of the prefrontl minicolumn circuitry nd pply tht to the delyed Go/NoGo tsk with its rewrd protocol for different stimuli. We focus on explining selective neuronl ctivity, s recorded by Schultz et l., with our model. A. Srm Srnm Surm GO NO GO GO Mterils nd Methods This model focused on replicting neuronl ctivity nd behvior in the experiments by Schultz et l.. In these experiments, n initil visul stimulus indictes one of three possible trils (Fig. 1A): (i) rewrded movement stimulus (Srm), whereby rewrd is given if the monkey presses key; (ii) rewrded non-movement stimulus (Srnm), whereby rewrd is given if the monkey chooses not to press the key; (ii) unrewrded movement stimulus (Surm), whereby the rewrd is not given but the key press is still required. Unless the movement is performed in the Surm tril, nother unrewrded Surm tril follows. The decision to move or not to move followed dely of 2 s, when trigger signl ws given, which ws identicl in ech tril. Schultz et l. found tht orbitofrontl neurons tht showed tsk relted ctivity fired selectively. Some responded with incresed firing rtes to specific instruction cue, some responded with incresed firing rtes predictive of Go/NoGo choice ccording to the expecttion of rewrd, nd some responded with incresed firing rtes to rewrd received. We propose tht gol directed behvior is lerned by ssociting sttes nd ctions tht re seprtely represented by the popultion of neurons of individul minicolumns. A stte is indicted by the perception of specific sensory stimuli or the perception of rewrd received, while n ction is indicted by proprioceptive input bout motor ctivity. According to our hypothesis, the initil sttes Srm, Srnm nd Surm, s well s the Rewrd stte, re represented by ctivity in individul minicolumns in the PFC, while ctivity in further two minicolumns represents ction selections Go (move to press key) or NoGo. During lerning of gol-directed behvior, STDP strengthens connections within nd between minicolumns so tht stte nd ction representtions re ssocited. Becuse ctivity tht corresponds to consecutive sttes nd ctions my pper t rbitrry time intervls, short-term buffer bsed on persistent spiking due to fter-depolriztion (ADP) of membrne potentil (Andrde, 1991; Klink nd Alonso, 1997b) is used to enble encoding with STDP (Lismn nd Idirt, 1995; Jensen et l., 1996; Koene et l., 2003). We propose tht the retrievl of gol-directed behvior depends on the spred of ctivity through strengthened connections from minicolumn tht represents the rewrd stte nd from the specific stte minicolumn ctivted by current input. Consistent with this hypothesis, experimentl evidence indictes tht retrievl in the PFC produces goldirected ctivity tht is initited by the desire for gol (Schultz, 1998; Schultz nd Dickinson, 2000; Miller nd Cohen, 2001). In our model, the spred of ctivity from the representtion of current stte is gted by the spred from desired gol. When the gted spred produces output from the minicolumn tht represents the current stte, the correct next ction is selected. Hence, the convergence of ctivity from current stte representtion nd from gol representtion governs goldirected behviorl responses. Given the representtion of sttes nd ctions, the trnsition from one stte to nother stte vi specific ction cn be encoded uniquely if there is specific neurl ctivity tht occurs only for tht ction nd only when the ction is initited in prticulr stte. This requirement leds to the presupposition tht functionl minicolumn contins popultions of input neurons nd popultions of output neurons tht form connections with other minicolumns, nd tht the neurons in those popultions re connected in structured mnner to other minicolumns (in this simultion to exctly one). Since the combintion of ctivity t specific input neuron nd specific output neuron of n ction minicolumn represents the trnsition from preceding stte to following stte, tht informtion gives the model the Mrkov property (Sutton nd Brto, 1998). With this property, one-step dynmics enble us to predict the next stte nd expected rewrd for specific ction. B. rewrd PFC circuitry simultion (b) rewrd (c) perceptul processing simultion unrewrded environment nd motor control simultion Figure 1. (A) Summry of the Schultz et l. tsk. Three visul stimuli indicted bove (frctl imges) nd different types of behviorl trils s in the simultion. (B) Design of the simultion. The simultion includes the experimentl environment of the opernt tsk in terms of the tsk protocol, visul stimuli nd motor ctions. () The output of tht simultion goes into the perceptul segment of the simultion. Perceptul stimuli re represented by spike trins, which re processed to produce spike pirs tht re used s n internl representtion. (b) The resulting neuronl spikes cuse ctivity in simultion of minicolumns in the PFC tht includes specifics of relevnt neurophysiology nd neurontomy. (c) Feedbck from the output of the simulted PFC directs motor ction in the opernt tsk. The functions of integrte-nd-fire neurons nd other essentil components were implemented in Ctcomb2. We developed simultions of the Schultz et l. tsk with Ctcomb2 (Cnnon et l., 2003) tht replicted the ctions of n gent (monkey) within n environment, s well s integrte-nd-fire neuron dynmics in PFC. With our pproch (which we cll design-bsed modeling), dt from simulted opernt tsk protocol ws linked with simulted neuronl circuitry for sensory processing nd functions of the PFC (see Fig. 1B). Further detils of the neurophysiology were modeled explicitly where needed for specific functionl requirements, such s the fterdepolriztion experienced by specific neuron popultions tht my enble persistent firing. The integrte-nd-fire neurons in our model of PFC minicolumns hve resting nd reset potentil of 60 mv nd n exponentil decy time constnt of 10 ms. The firing threshold is 50 mv nd ction potentils hve durtion of 1 ms, followed by 2 ms refrctory period nd subsequent strong fter-hyperpolriztion with reversl potentil 90 mv nd exponentil decy time constnt 30 ms. We used dul-exponentil functions for the responses of synptic conductnces. Unless the description of specific synptic connection indictes otherwise, the time constnt for the rise of the dul-exponentil response function ws 2 ms nd the time constnt for the fll ws 4 ms. Excittory synptic connections hd reversl potentil of 0 mv nd inhibitory synptic connections hd reversl potentil of 70 mv. In the simultion of the opernt tsk environment, stimuli produced by visul cues nd rewrd, s well s proprioceptive senstion of motor ctivity re conveyed s spike trins (top of Fig. 2) tht re produced by specific neurons [signl pthwy () in Fig. 1B]. The simultion of perceptul processing circuitry receives those spike trins nd trnsforms them into relible sequences of stte--ction spike pirs (bottom of Fig. 2). Every time tht spike trin corresponding to new stte or new motor ction is detected, pir of spikes is generted tht represents the most recent stte nd the most recent ction. The individul spike times of stte--ction spike pir re seprted by severl cycles of thet rhythm to insure tht persistent spiking of the most recent two spike inputs to the short-term buffer occurs over suffcient durtion to chieved strong ssocitive connections through () Pge 2 of 19 Prefrontl Cortex Model d nd Hsselmo

3 Figure 2. Input spike trins of sensory input (top) nd membrne potentil showing spike pirs tht re the internl representtion of chnges of stte or ction (bottom). Verticl lines seprte trils (fter which buffers re clered). Rules re lerned by exposure to both rewrded nd non-rewrded conditions in seven different trils: (1) NoGo following Surm does not led to rewrd; (2 & 5) Go following Surm leds to rewrded tril; (3) Srm nd Go leds to rewrd; (4) Srm nd NoGo does not led to rewrd; (6) Srnm nd NoGo leds to rewrd; (7) Srnm nd Go does not led to rewrd. STDP. To simplify the redbility of the grphs, n identity mtrix is used for input connections to the set of PFC minicolumns insted of lerned mpping [signl pthwy (b) in Fig. 1B]. Motor ction in the opernt tsk is driven by the output of prefrontl minicolumns [signl pthwy (c) in Fig. 1B]. In this mnner, the seven trils shown in Figure 2 re simulted during encoding so tht ll relevnt rules re lerned in the network of prefrontl minicolumns. Specific Neuron Popultions within Prefrontl Minicolumns Achieve the Gting of the Forwrd Spred of Activity by Spred from the Gol Retrievl nd encoding of ssocitions between prefrontl minicolumns tht represent sttes nd ctions re ssumed to tke plce in opposite phse intervls of rhythmic modultion t 8 Hz (Hsselmo et l., 2002) tht represents thet rhythm found in the PFC nd hippocmpus (Mnns et l., 2000). This enbles both to occur t ny time during tsk. The modultion supports different dynmics in the two modes. We will therefore discuss the distinct functions of encoding nd retrievl seprtely, even though they lternte continuously during simulted tsk. The modulting rhythm lso serves to insure tht ctivity in different simulted brin regions is properly synchronized, s described in our previous work (Koene et l., 2003). The plot of membrne potentil for the buffer neuron buf (Rew) in Figure 6B provides n exmple of the modultion by thet rhythm nd clerly demonstrtes rhythmic chnges t 125 ms intervls. As shown in Figure 3, we distinguish five popultions of pyrmidl neurons in ech presupposed functionl minicolumn of PFC:, g i, g o, c i nd c o. Of these, ech neuron connects exclusively to other neurons within the sme minicolumn nd plys n importnt role during encoding of ssocitions between minicolumns. These neurons represent neurons tht receive thlmic input in lyer IV of PFC. The neurons of popultion lbeled g o experience suprthreshold depolriztion during encoding in response to input from (with fixed conductnce of 5.2 ns nd time constnts 1 ms for the rise nd 2 ms for the fll of the synptic response), but during retrievl g o is inhibited by n interneuron network tht is driven by. A spike in during encoding lso provides subthreshold depolriztion to ll neurons of popultion lbeled g i (with fixed conductnce of 1.0 ns nd time constnts 12 ms for the rise nd 20 ms for the fll of the synptic response). The output of ech neuron in the g o popultion projects to one of the other minicolumns in the PFC network. In the g i popultion, ech neuron receives one connection from g o neuron locted in nother minicolumn. Synptic weights re modifible on these connections between different minicolumns nd re the elements of mtrix Wg. When strengthened, the Wg connection cn fire unit g i if the presynptic unit g o is ctive. Such connection indictes tht rule ws lerned tht expresses the knowledge tht ctivity in the minicolumn contining the presynptic neuron g i preceded ctivity in the minicolumn of the connected g o neuron. Similrly, ech neuron of popultion c o mkes one connection to neuron in c i popultion of nother minicolumn, so tht ctivity in the c o popultion cn trget ny one of the other minicolumns specificlly. Agin, the synptic strengths of such connections re modifible nd mke up elements in mtrix W c. Unlike the effect of synptic weights in W g, postsynptic depolriztion due to input through connection with the mximum strength in W c is subthreshold, so tht spiking in c i remins dependent on dditionl input. The dditionl input to neurons in c i, which cn elevte their membrne potentil over threshold, is supplied by one-to-one connections (n identity mtrix) from neurons in g o (with conductnce of 2.5 ns nd time constnts 1 ms for the rise nd 2 ms for the fll of the synptic response). The ctivity of g o therefore fulfills gting role with regrd to spike propgtion to c i. Within minicolumn, every neuron in g i connects to every neuron in g o through modifible synpses with weights in W ig, while every neuron in c i connects to every neuron in c o through modifible synpses with weights in W ic. The mximum depolriztion cused by connection encoded in W ig is suprthreshold, while depolriztion cused by strengthened connections in W ic is limited to subthreshold vlues. Cerebrl Cortex Pge3of19

4 Koene (A) (B) Surm go ci go ci g o c i Surm Srm (3) Srnm (6) (2) gi co gi co g i c o Go (3) (6) go Go gi c (3&6) i g o c i NoGo (6) Srm Srnm Additionl depolriztion is provided to c o by one-to-one connections from neurons in g i (with conductnce of 2.5 ns nd time constnts 1 ms for the rise nd 2 ms for the fll of the synptic response). This provides gting function for decisions bout which ction is selected bsed on convergence. The fn-out of connections within minicolumn between g i nd g o nd between c i nd c o enbles the encoding of multiple routes between minicolumns. The following sections will first describe the retrievl process nd then describe encoding. Retrieving Behviorl Rules in The PFC Miller nd Cohen propose tht the top-down processing in which behvior is guided by internl sttes or intentions (cognitive control) stems from the ctive mintennce of ptterns of ctivity in PFC tht represent gols nd the mens to chieve them. They suggest tht these ptterns provide bis tht guides ctivity ffecting behvior, gting g i c o co Go NoGo = stte = ction rewrd g o gi ci (3) (3) (6) (6) c o rewrd rewrd Figure 3. During trining, ssocitions re lerned between stte nd ction minicolumns. The network of minicolumns (A) is shown with the connections between them. Activity spreds long ssocitions directed both from the minicolumn representing the gol (dshed rrows) nd forwrd from the minicolumn representing the current stte (dotted rrows). To simplify the schemtic, popultions of neurons, g i, g o, c i nd c o s shown in the Surm minicolumn were reduced in the other minicolumns to disply only those neurons tht re involved in encoded ssocitions. The numbers in brckets correspond to the mrked trining trils in Figure 2, in which n ssocitive connection is estblished by STDP. Here, ctivity in the neuronl popultions of the minicolumns is indicted by shded neurons. This is shown for retrievl of the correct ction tht leds to rewrd from current stte, Srm, in which the rewrded move stimulus ws perceived. Neurons tht spike re circles shded gry. A seprte digrm (B) shows liner representtion of the ssocitive connections tht re strengthened during rule lerning (numbers in brckets gin correspond to trining trils in Fig. 2). The Go nd Rewrd minicolumns ech fulfill two roles in the encoded rules. function nd support their theory with neurobiologicl, neuroimging nd computtionl studies (Miller nd Cohen, 2001). In our simultion, ssocitions tht form known rules re encoded in PFC. A desire for rewrd then elicits spred of ctivity from the minicolumn representing tht rewrd stte (see dshed lines in Fig. 3 nd left rrows in Fig. 3b). The neurons of the g o popultion within tht Rewrd minicolumn spike simultneously in response to rhythmic input t n 8 Hz thet frequency. Those spikes propgte long connections with strengthened synptic weights in W g nd produce spike in the trgeted g i neurons of minicolumns tht immeditely preceded the Rewrd minicolumn in known rule. Within such preceding minicolumn ( minicolumn tht represents n ction) spike elicited t neuron in the g i popultion fns out cross strengthened connections to neurons in the g o popultion of tht minicolumn. Through those connections with strengthened synptic weights in Wig, suprthreshold depolriztion is elicited t the trget g o neuron. This sme process is repeted in other consecutive minicolumns to spred ctivity through the g i nd g o popultions of consecutive ction nd stte minicolumns. As the spred brnches out, it follows multiple reverse pths through connections tht ssocite sttes nd ctions. Once the spred of ctivity reches the minicolumn tht represents the current stte, the convergence of current stte nd gol spred llows selection of ction. In ddition, spikes in g o neurons re inhibited ( end-stopping ) by the synchronous ctivity of interneurons (with time constnts 1 ms for the rise nd 10 ms for the fll of the synptic response of the input) elicited by input tht identifies the current stte. The selection of ction is indicted by n interction of the gol spred with current stte. The input tht identifies the current stte lso trgets the neurons in the c o popultion of the sme current stte minicolumn. The excittory input produces subthreshold depolriztion of c o neurons. In ddition to this input, the spiking of neurons in the c o popultion is gted by popultion g i ctivity in the sme minicolumn due to the spred of ctivity from the gol. Those c o neurons tht receive dditionl depolriztion from spiking neurons in the g i popultion fire. The present simultion uses only the first step of the forwrd spred to determine output tht controls gol-directed behvior in the tsk, so the forwrd gting only hs n effect on the c o of the minicolumn representing current stte. The output of neurons in the c o popultions of stte minicolumns tht trget ction minicolumns is connected to the motor circuitry of the simultion. A spike in c o thereby drives motor output of the corresponding ction (thick blck rrow in Fig. 3). A spike in c o lso cuses spiking in interneurons tht provide lterl inhibition to the remining neurons in c o, so tht cler winner-tkes-ll behviorl response is obtined. For other pplictions, the minicolumn model lso enbles forwrd spred of ctivity for known ssocitions encoded in the PFC (see dotted lines in Fig. 3 nd right rrows in Fig. 3b). The spikes tht propgte through connections with strengthened synptic weights in W c cuse subthreshold depolriztion of c i neuron in the ssocited ction minicolumns. Agin, forwrd spred of ctivity is gted by the spred from the gol, since neuron in the c i popultion needs dditionl depolriztion from corresponding neuron in the g o popultion to fire. The spike of c i neuron fns out through connections with strengthened synptic weights in W ic to c o neurons tht re gted by the dependence on ctivity in g i neurons in the sme minicolumn. Figure 3 includes n exmple of rule retrievl in rewrded move tril. Neurons tht spike s ctivity spreds re represented by gry circles. The exmple points out the importnce of neuron popultions g i, g o, c i nd c o, in which individul neurons mke connections with other minicolumns. As shown in Figure 3, desire for rewrd cuses ll neurons in the g o popultion of the Rewrd minicolumn to fire. The ctivity then spreds to ssocited minicolumns, including Go, NoGo nd ll sensory input minicolumns. In the sme tril, when the Srm stimulus is perceived, the c o popultion of the Srm minicolumn is depolrized. In the Srm minicolumn, the specific depolrized c o neuron tht corresponds with spiking neuron of the g i popultion fires, so tht ctivity spreds forwrd long route from minicolumn Srm to minicolumn Go. The firing of the c o neuron is used to generte the Go response. An nlogous pproch would be to use the spikes of c i neuron in the Go minicolumns to generte the Go response. During this process, the g o popultion of the Srm minicolumn is inhibited Pge 4 of 19 Prefrontl Cortex Model d nd Hsselmo

5 (end-stopping). Figure 3 shows tht the spred of ctivity from the gol is stopped there. In the exmple, spreding ctivity from the Rewrd minicolumn involves two different known pths tht include the Go minicolumn. One pth retrieves the ssocited items Rewrd--Go--Srm, the other retrieves the ssocited items Rewrd--Go--Surm nd seprte pth through NoGo retrieves Rewrd-- NoGo--Srnm. [The retrievl of rules resembles the sequence of trnsitions in finite stte mchine (Hrel, 1987) nd the recurrent connections tht led to two visits of the Go minicolumn in trils initited by the Surm stimulus re reminiscent of connectionist Elmn networks (Elmn, 1990, 1991).] Since the spred of ctivity through different known pths elicits spikes t seprte g i ½AQ2Š neurons, they do not interfere with ech other. And since the neurons in c i nd c o popultions lso mintin seprte connections with other minicolumns, the ctivity in g i correctly llows the gted forwrd spred to propgte only on pth from stte receiving current input. Thus, the structure of our model llows mpping through the sme ction from different sttes. While retrievl ctivity spreds forwrd long known pths to rewrd, those spikes elicited in the c o popultion of the current stte minicolumn tht trget ction minicolumns lso trigger the output of PFC. In Figure 3, the spike propgtion through the connection from minicolumn Srm to minicolumn Go is therefore mrked s thick blck rrow. This output genertes the correct Go response, thereby guiding successful gol-directed behvior. Encoding Behviorl Rules in The PFC The bove section described retrievl. This section describes encoding. During encoding, the neuron lbeled in the model of minicolumn fires when input tht mtches the item represented by the minicolumn is received. For exmple, when n input spike indictes tht rewrdedmove stimulus, Srm, is detected, tht input cuses neuron (Srm) to spike. Here, it is ssumed tht stimuli ctivte minicolumn n fter minicolumn n 1. Encoding is chieved by STDP (Levy nd Stewrt, 1983; Mrkrm et l., 1997; Bi nd Poo, 1998) tht corresponds to the long-term potentition (LTP) of synptic responses (Bliss nd Lømo, 1973; Bliss nd Collingridge, 1993). The four steps described below tke plce sequentilly in ech encoding cycle. () (b) (c) (sub) n 1 n 1 (SUP) (sub) c i W ic (d) n 1 (SUP) n 1 g i g i c o g i c o (sub) g o g i Wig (sub) W g (sub) W c n g o (SUP) n g o (sub) c i n n Reverse Associtions between Minicolumns re Encoded in Weight Mtrix W g t synpses from g o (n) onto g i (n 1) A short-term memory (STM) buffer mintins spiking tht corresponds with the two most recent inputs to the network of minicolumns. During this rectivtion in encoding phses of PFC minicolumns, (n) spikes less thn 20 ms fter (n 1). As shown in Figure 4, the neuron (n 1) provides subthreshold depolriztion to ll the neurons of the g i popultion in minicolumn n 1. And ll neurons in the g o popultion in minicolumn n receive suprthreshold depolriztion through synpses from (n). As the neurons in g o (n) spike, tht neuron in the g i popultion of minicolumn n 1 which is connected to neuron in g o (n) receives subthreshold depolriztion, due to the initil vlue of synptic strengths in weight mtrix W g. The neuron in g i (n 1) tht receives input from both (n 1) nd g o (n) spikes few milliseconds lter thn the presynptic neuron in g o (n), so tht STDP is elicited. Thus, the mplitude of the corresponding synptic response is incresed in Wg. After severl repetitions in the STM buffer, encoding estblishes suprthreshold connection between g o (n) nd g i (n 1) (Fig. 4). Forwrd Associtions between Minicolumns re Encoded in Weight Mtrix W c t Synpses from c o (n 1) onto c i (n) Rhythmic input modultes the membrne potentil of neurons in c o. During the encoding phse, the rhythmic depolriztion of neurons in c o (n 1) is such tht excittory input through one-to-one connections from g i (n 1) in the sme minicolumn cuses postsynptic spiking. The spiking in g i (n 1) tht is described in the encoding step bove therefore drives spiking in c o (n 1), s shown in Figure 4b. The neurons in c i (n) receive subthreshold (gting) depolriztion through one-toone input from neurons in g o (n). In the presence of rhythmic depolriztion s bove nd given smll initil vlues in W c, the neuron in c i (n) tht is connected to neuron in the c o popultion of minicolumn n 1 spikes due to the combined subthreshold inputs from both g o (n) nd c o (n 1). Agin, STDP is elicited, since the postsynptic neuron in c i (SUP) Figure 4. The four steps, (--d), of rule encoding in the PFC. Rectngles indicte the nth minicolumn tht ctivtes nd the one tht precedes it t n ÿ 1. Thin rrows indicte connections between neuron popultions (lowercse letters within the rectngles) tht my result in subthreshold postsynptic depolriztion (mrked sub), while thick rrows indicte connections tht my result in suprthreshold depolriztion (mrked SUP). The mtrix of synptic weights tht is updted in n encoding step is indicted by W g, W c, W ic nd W ig below n rrow tht represents connections with synpses tht re being modified. c i (n) spikes few milliseconds fter it receives input from the presynptic neuron in c o (n 1). After repetition, subthreshold connection is estblished between c o (n 1) nd c i (n), which propgtes spikes if input is received from the corresponding neuron in the gting g o (n) popultion, even when rhythmic depolriztion is bsent in retrievl phses. Rules tht Associte Preceding with Possible Ensuing Activity re Encoded within Minicolumn by the Weight Mtrix W ic t Synpses from c i (n 1) onto c o (n 1) During encoding, the ctivity of the c i popultion is driven by n STM buffer tht mintins the ctivity of c i popultions of the twomost recently ctive minicolumns. [The buffer holds two items so tht the buffered ctivity c i (n) cn replce c i (n 1) s the memory of preceding ctivity in c i when the next ssocition with minicolumn n + 1 is encoded.] As Figure 4c shows, neurons in c i (n 1) spike severl milliseconds before spiking of neurons in c o (n 1) is driven by Cerebrl Cortex Pge5of19

6 Koene corresponding spikes in popultion g i (n 1) (with synptic conductnce of 6.0 ns), s described bove. STDP is elicited nd repetition increses synptic strengths in W ic from initil vlues ner zero to subthreshold mplitudes. Associtions tht Enble the Spred of Activity from the Representtion of Gol re Encoded by the Weight Mtrix W ig t Synpses from g i (n 1) onto g o (n 1) within Minicolumn During encoding, spiking in subpopultion of g o tht is identified s in minicolumn n 1 is driven by input from c i (n 1), s shown in go specific Figure 4d. A dely in the synptic trnsmission from c i (n 1) insures tht the spikes t go specific occur severl milliseconds fter spiking in g i (n 1). At connections tht repetedly experience STDP due to this sequence of spiking, the synptic strength inwig is incresed from ner zero to suprthreshold vlues. The popultion go specific nd popultion of neurons known s go diffuse provide seprte encoding functions, but s shown in Figure 5, they ct together s g o during retrievl. In the retrievl mode, trnsmission from c i (n 1) to neurons in go specific is suppressed, while input from g i is received through connections with synptic strengths Wig. The pttern of spikes in g i nd suprthreshold synptic strengths estblished in Wig therefore determines retrievl spiking in go specific. Tht spiking is duplicted in go diffuse during retrievl, since trnsmission is then enbled through strong one-to-one input connections from go specific. By contrst, ll neurons in the go diffuse popultion of minicolumn re driven by during encoding modes, so tht they provide the diffuse output of g o (n) tht is used to encode W g nd W c, s described bove. In this mnner, the two sub-popultions of g o cn spike in seprte ptterns tht stisfy the different needs of encoding protocols for synpses within minicolumn (W ig ) nd between minicolumns (W g nd W c ). This function could lterntively be obtined by very tightly regulting the ctivity of g o t different phses. Short-term Memory Bsed on Persistent Spiking Enbled Spike Timing Dependent Potentition to Encode Associtions As described, encoding in our model of the PFC depends on STDP in W g,w c, W ig nd W ic, nd on the buffered ctivity of popultions nd c i.a Hebbin model of STDP tht is bsed on the long-term potentition observed t mny synpses requires multiple instnces in which presynptic spiking precedes postsynptic spiking by <40 ms (Levy nd Stewrt, 1983; Mrkrm et l., 1997; Bi nd Poo, 1998), while input to the PFC my rrive with rbitrry lrge time intervls. As mentioned previously, we therefore presuppose tht firing ptterns my be rectivted in persistent mnner by intrinsic neuronl mechnisms, g i Wg trnsmission modulted for encoding g o diffuse trnsmission modulted for retrievl gspecific o g i Wig c i(n 1) filter initilly ner 0 Figure 5. Subdivision of the g o popultion into functionl g diffuse o nd g specific o neuron popultions. Neurons in g diffuse o ll spike in response to ctivity in, while the spiking of neurons in g specific o reflects the specific ptterns of spikes received through one-to-one connections from c i (n ÿ 1). Spiking in the filter popultion relies on rhythmic depolriztion, so tht only c i (n ÿ 1) ctivity in the short-term memory buffer of c i drives g specific o during encoding. This wy, the strength of unique connections in W g to other minicolumns is encoded seprtely from the encoding of W ig in ccordnce with the mpping of pttern of spikes in g i to pttern of spikes in g specific o. During retrievl, strong one-to-one connections from g specific o to g diffuse o drive the entire g o popultion s one. such s fter-depolriztion (ADP) of membrne potentil (Fig. 6A), cused by clcium sensitive ction currents tht re induced by muscrinic receptor ctivtion (Andrde, 1991; Klink nd Alonso, 1997). We lso presuppose tht common brin rhythm my produce oscilltory modultion in different regions tht provides synchroniztion of ctivity. The rectivtion of firing ptterns by ADP in one popultion of neurons t specific phses of the brin rhythm cn thereby relibly provide input to other popultions in the PFC where STDP cn occur in n encoding mode (Fig. 6B). Using rhythmic modultion nd ADP, we provide short-term memory (STM) in mnner similr to the STM model first proposed by Lismn nd Idirt (1995) nd Jensen nd Lismn (1996). Recurrent inhibition within such buffer seprtes the rectivtion of sequentil items to mintin their order. The STM my reside in the PFC or my be provided by input from the entorhinl cortex. The membrne potentils of three neurons of n STM buffer re plotted in Figure 6B. In the hippocmpus, regulr ctivity originting in the septum (Brzhnik nd Fox, 1999) is believed to cuse 8 Hz oscilltions of the membrne potentil by modulting the GABAergic inhibition of pyrmidl cells vi networks of interneurons (Alonso et l., 1987; Stewrt nd Fox, 1990). A similr mechnism ppers to cuse thet rhythm oscilltions in limbic cortices due to rhythmic ctivity of bsl forebrin neurons Mnns et l. (2000). Those oscilltions define two functionl phses of the buffer neurons. We cll the phse intervl of gretest rhythmic depolriztion the rectivtion phse of STM nd the remining intervl the input phse of STM. The plots show tht spiking produced by fferent ctivity during the input phse of the buffer is rectivted by the ADP during subsequent repetition phses. The durtion of the rise of the ADP mtches the period of oscilltion. This mens tht the ADP of the erliest neuron to spike in one cycle llows tht neuron to rech threshold first in the following cycle. The order of spikes is mintined during rectivtion in STM. As spikes cused by the buffer occur in pre- nd postsynptic neurons of modifible connections in the PFC, n symmetric function of spiketiming dependent potentition tkes into ccount the order of spikes. This ensures tht STDP is elicited in specific connections so tht direction of cuslity is inferred during rule lerning. Furthermore, the seprtion of consecutive spikes is mintined in STM by recurrent inhibition tht is cused by the ctivtion of n interneuronl network (Brgin et l., 1995) ech time buffer neuron spikes. In the bsence of input, the contents of n STM buffer decy grdully, due to noise nd slow-fterhyperpolriztion (AHP). But when full buffer receives new input, such s when rule lerning involves long sequence of sttes nd ctions, the erliest item in the buffer needs to retire so tht the new item is mintined. The item replcement must lso void chnging the order of items. To chieve this, we propose tht the ppernce of new item leds to inhibition t specific phse of the rhythmic oscilltion (see dshed box in Fig. 6C). Inhibition t tht specific phse suppresses the rectivtion of the first item (Koene et l., 2003) until its ADP hs subsided, s shown in Figure 6C. The new item, represented by ction potentils in the plot of the membrne potentil of the third cell, ssumes the lst position in the sequence of rectivtion. Ech neuron in n STM buffer projects output to corresponding trget neuron in or c i. Current nd preceding ctivity re therefore vilble for encoding, s shown in Figure 7 for the membrne potentil of neurons throughout the network. The ctivity in corresponds to current nd preceding input, s pirs of stte nd ction spikes re received in PFC during the seven simulted encoding trils of rule lerning (Fig. 2). Results The network described bove effectively encoded the different rules of the tsk nd showed effective behviorl performnce when tested with different stimuli, generting Go response to Srm, NoGo response to Srnm nd Go response to Surm stimuli. This behvior ws guided by spiking ctivity tht mtches the dt obtined by Schultz et l. (2000). In the seven trining trils (Fig. 2), the necessry ssocitions for stimulus gted selection of ction were encoded with Pge 6 of 19 Prefrontl Cortex Model d nd Hsselmo

7 A. mv ms B. recurrent trnsmission fferent trnsmission buf (Surm) buf (NoGo) buf (Rew) 1 in buffer fferent spike recurrent gmm inhibition persistent rectivtion (STM) ms C. recurrent trnsmission fferent trnsmission buf (Srm) 2 in buffer item replced buf (Go) buf (Rew) +1 new input FIFO dely interneuron 4000 synchronized inhibition 4400 ms Figure 6. (A) ADP supports persistent firing. Ech spike cuses initil AHP of the membrne, which is followed by slow ADP. Tht depolriztion cn ultimtely led to nother spike. (B) A buffer bsed on persistent firing receives fferent input during one phse of its rhythmic cycle nd rectivtes items (seprted by competitive inhibition) in order in ech cycle. (C) First-in-first-out (FIFO) item replcement. In full buffer, fferent input plus retrievl ctivity elicit inhibition synchronized to suppress rectivtion of the first item. The input is dded t the end of the sequence. strengthening of connections using STDP t synpses in W g, W c, Wig nd Wic. Six trils were used to test performnce with ll possible initil stimuli. For these trils, the spike trins tht represent the senstion of the initil stimulus were provided s input nd the model-generted motor commnds tht led to behviorl responses nd the senstion of rewrd received were observed. The network showed the correct behvior in the tsk. The correct ction followed ech initil stte during tests of tsk Cerebrl Cortex Pge7of19

8 Koene URM URM RM RM URM RNM RNM buf(surm) buf(srnm) buf(srm) buf(nogo) buf(go) buf(rew) ms Figure 7. The membrne potentil of neurons in the popultion, responding to input from the short-term memory buffer during the trining stge (encoding) of the visul discrimintion tsk in sequence of six trils. performnce. Inspection of individul neuronl responses revels tht the three min types of responses observed by Schultz et l. were lso found in the present simultions: (i) neurons tht respond selectively to tril-specific initil stimulus; (ii) neurons tht respond prior to rewrd in specific tril nd my indicte chosen course of ction; nd (iii) neurons tht respond selectively to predicted nd obtined rewrd. In ddition to these, severl more specilized responses were observed, providing predictions of the model. During performnce of the opernt tsk, desire for rewrd begins t the onset of every tril in the form of regulr suprthreshold input to ll neurons of the g o popultion of the minicolumn tht represents the gol. When tril input stimuli pper in different trils they re mintined s persistent spikes of buffer neurons tht cuse the spiking of (Srm), (Srnm) nd (Surm) in Figure 8. These input stimuli lso provide subthreshold input to the c o popultion of the minicolumn tht represents the current stte. Converging with the spred of ctivity from the gol minicolumn, spiking c o neurons drive gol-directed behvior, resulting in the genertion of output which in turn cuses proprioceptive feedbck of the correct ction in ech sequence in Figure 8, s well s the perception of rewrd received. Activity Underlying Selective Responses in the Model Membrne potentils of those neurons within minicolumn tht re involved in the choice of ction demonstrte the decision process tht is bsed on forwrd spred of ctivity tht is gted by the spred of ctivity from the gol. This is shown in Figure 9, in which membrne potentils of relevnt, g i nd c o neurons in the minicolumn tht represent the Surm instruction stte re plotted during n intervl within n Surm tril (the convergence looks the sme for the Srm exmple in Fig. 3). The plots show tht neurons in the c o popultion of tht minicolumn experience subthreshold depolriztion due to current stte input from. This contribution is joined by converging input from specific neuron in the g i popultion tht spikes due to the spred of ctivity from the minicolumn tht represents the gol (dshed rrows in Fig. 3). When the inputs converge neuron of the c o popultion fires (bottom of Fig. 9). Activity in c o ws gted by ctivity in g i, nd recurrent inhibition ssured tht only the first spike in c o led to behviorl response. The chosen behvior ws determined by the minicolumn tht ws trgeted by tht spike, in this exmple Go motor commnd for the simulted tsk environment. For the six test trils, the spike trins tht represent the senstion of the initil stimulus, motor commnds tht led to behviorl responses nd the senstion of rewrd received re shown in Figure 10. The spike trins show tht Srm stimuli were followed by Go responses nd rewrd, Srnm ws followed by NoGo responses nd rewrd, nd Go ction responses followed Surm stimuli nd led to subsequent rewrded trils. The network cn perform correctly regrdless of the order of presented test stimuli. Schultz et l. plotted the recorded spikes of three orbitofrontl neurons during mny rewrded move (Srm) nd unrewrded move (Surm) trils. We compre our simultion results with those of the experiment by Schultz et l. by displying results for the three min ctegories of neuronl responses described by Schultz et l. side by side in Figure 11. These plots show spikes in individul trils (short verticl lines) ligned to specific prts of the tsk. As in the Schultz et l. results, our results showed tht individul neurons ctivte specificlly when one of the three Pge 8 of 19 Prefrontl Cortex Model d nd Hsselmo

9 RM RNM URM RM URM RNM (Surm) (Srnm) (Srm) (NoGo) (Go) (Rew) ms Figure 8. The membrne potentil of neurons in the popultion, responding to input from the STM buffer during behviorl performnce (retrievl) of the visul discrimintion tsk in sequence of six trils. A chnge of context between rewrded movement (RM), rewrded non-movement (RNM) nd unrewrded movement (URM) trils cuses the STM buffer to cler during the those intervls. Figure 9. Selected membrne potentils during converging forwrd spred nd spred from the gol in unrewrded move (Surm) tril. The forwrd spred is initited in stte Surm, s represented by the ction potentil of the neuron (top). The spred of ctivity from the gol reches the Surm minicolumn when n ction potentil ppers t specific g i neuron (middle). A resulting ction potentil tht directs Go ction ppers t specific c o neuron of the sme minicolumn (bottom). cue stimuli is perceived. In our model, this is cused by the current stte response of the popultion (Fig. 11A,D). We lso found individul neurons tht ctivte for chosen behviorl response. This ctivity results when neurons of the c o popultion in the current stte minicolumn receive gting ctivity from g i neurons due to the spred of ctivity from the gol minicolumn (Fig. 11B,E). We lso found neurons tht ctivte specificlly when rewrd is received. This ctivity is cused by the current stte ctivtion of the neuron in the gol minicolumn in our model (Fig. 11C,F). As in the Schultz et l. dt, there is spiking in Figure 11E during Srm nd Surm trils, but the spike rte is higher during the Go ction in Srm. Both the dt nd the output of our model show quntittive difference in the mount of firing between Srm nd Surm trils before rewrd is received. In our model, this is explined becuse c o (Srm/Go) is ctivted in encoding phses in both trils when (Go) is mintined by the STM buffer, since strengthened connections from g o (Go/Srm) to g i (Srm)Go) propgte the ctivity. Additionlly, c o (Srm/Go) is ctivted specificlly in the Srm tril when the gol spred cuses spiking in the gting g i (Srm)Go) neuron, while current stte input depolrizes the c o (Srm) popultion. The ppernce of similr ctivity t the trigger time during URM trils in Figure 11B suggests tht the ctivity is not merely bckground noise nd supports the possible explntion provided by our model. A smller temporl overlp of ctivity similr to tht in the Schultz et l. results is chieved if the intervls between instruction stimulus, ction trigger nd rewrd delivery re incresed in the model to mtch the dt, for tril length of 6--8 s insted of 1500 ms in the simultion. The shorter intervls in the model significntly reduced the time needed to compute ech simultion run without ffecting resulting behvior. Some Neurons in the PFC re Active in Multiple Behviors In ddition to the results bove, we found tht some neurons in the simultion ctivte selectively for specific phse of two different trils. As shown in Figure 12A, the (Go) neuron in the minicolumn tht represents movement response spikes in rewrded movement nd unrewrded movement trils. Similrly, the (Rew) neuron in the minicolumn tht represents Cerebrl Cortex Pge9of19

10 Koene Figure 10. The guided output (Go or NoGo) of the model in response to test stimuli (Srm, Srnm nd Surm). Spike trins produced in the motor circuitry of the simulted opernt tsk environment during trils tht test tsk performnce (testing retrievl). The spike trins represent sensory stimuli received during the trils (seprted by verticl lines), s well s behviorl Go nd NoGo motor responses nd the senstion of rewrd received. Figure 11. A side-by-side comprison of neuronl ctivity recorded by Schultz et l. (A--C; figure reproduced from Schultz et l., 2000) nd tht produced by our simultion of PFC minicolumns (D--F). Figures in (A--C) disply spikes of three different orbitofrontl neurons. For ech, the ctivity in rewrded nd unrewrded movement trils is shown side by side. And every row within the borders of grph represents the ctivity of tht neuron during seprte tril. The time course of the dt nd of the model output re ligned to specific tsk events. Lbels below horizontl time xis indicte stges of the opernt tsk: instruction stimulus, ction trigger, rewrd. Above ech grph, histogrm shows the sum of spikes in ech bin of time, i.e. in corresponding column over ll trils. (D) The spike responses of n popultion neuron in the RM (rewrded move) minicolumn ligned to instruction. (E) Ac o popultion neuron in the RM minicolumn with output connections to the GO minicolumn ligned to rewrd. (F) An popultion neuron in the REW (Rewrd) minicolumn. Agin, the spikes (short verticl lines) of ech neuron re shown side by side in both rewrded nd unrewrded movement trils. Rows within ech figure show the results of seprte simultion runs, while the cumultive spike rte is plotted bove ech figure by counting the number of spikes within n intervl round t. The three neurons in (D--F) replicte the experimentl results by Schultz et l. in the corresponding ctegories (A--C). the perception of rewrd spikes in rewrded movement nd rewrded non-movement trils. In Figure 12B, we show tht specific neurons in the g i nd g o popultions of minicolumns tht re involved in the retrievl of ssocitions with gol generted spike in every tril of tht specific tsk. The neurons tht ctivte throughout ech tril correspond to those involved in the lerned ssocitions for the spred of ctivity from the gol during retrievl, s shown in Figure 3. Thus, even neurons with very extensive response properties re importnt for performnce of this tsk. Activity of the popultion in the current stte produces end-stopping of ctivity through the g o popultion in the sme minicolumn. Therefore, the onset of rewrded move (RM) tril produces end-stopping t g o (Srm) cells, but, due to the ssocitions from Pge 10 of 19 Prefrontl Cortex Model d nd Hsselmo

11 A. RM RNM URM RM URM RNM (Go) (Rew) B. g (Surm Go) i (Srnm NoGo) RM RNM URM RM URM RNM g i g (Srm Go) i g (NoGo Rew) i g (Go Rew) i g (Go Srnm) i g (Go Srm) i R R R R ms C. RM RNM URM RM URM RNM (Go) gdiffuse o (Srm Go) R R Figure 12. (A) Spike ctivity of neurons in the Go nd Rewrd minicolumns during performnce trils. Neurons (Go) tht predict movement response re ctive in rewrded movement nd unrewrded movement trils. Neurons (Rew) tht spike when rewrd is received re ctive in rewrded movement nd rewrded non-movement trils (B). Retrievl ctivity in the simultion shows tht specific g i nd g o popultion neurons spike in ll trils. Regulr spike trins spn trils for ech of the neurons shown. In our exmple, 24 neurons in g i nd g o popultions were found to spike regulrly during retrievl in ll trils of the performnce stge of the specific tsk, ~7% of the totl of 328 neurons involved in retrievl functions. (C) The g diffuse o ðsrm GoÞ neuron of the Srm minicolumn shows the end-stopping function. The neuron spikes throughout trils due to the spred of ctivity from the gol minicolumn, but in rewrded movement (RM) trils spiking stops s soon s rewrd is received (indicted by rrows with the lbel R ). The overlp between spike trins of the (Go) neuron nd the g diffuse o ðsrm GoÞ neuron shows the period during which both Srm nd Go minicolumn ctivity re mintined in n STM buffer for encoding. Rewrd to Srnm vi NoGo nd from Srnm to Surm vi Go, neuron in g i (Surm) lso spikes during tht tril. Similrly, g i (Surm) spikes during rewrded non-movement trils due to the lternte pth for the spred of retrievl ctivity from the gol vi the Srm minicolumn. Thus, we predict correltion of neuronl firing during Surm nd Srm trils (strong Go involvement in both), nd lesser correltion of neuronl firing during Surm nd Srnm trils, s shown in rows 1 nd 3 of Figure 12B. Activity in Figure 12C demonstrtes the end-stopping function proposed in the minicolumn model. During rewrded movement trils, the neuron go diffuse ðsrm)goþ is ctive until rewrd is received. As soon s the perception of rewrd becomes the current stte of the PFC network, the neuron is no longer ctive. This is not the cse in rewrded nonmovement nd unrewrded movement trils. In rewrded movement (RM) trils, end-stopping prevents the spred of ctivity from the gol to the g o minicolumn. During these trils, the go diffuse popultion of the Srm ðsrm)goþ neuron is ctive in encoding modes of ech rhythmic cycle while mintined in the STM buffer. When rewrd is perceived, the Go--Rewrd pir replces the Srm--Go pir in the buffer, s seen in the bottom two rows of Figure 12B. End-stopping ppers in Srm (RM) trils nd Srnm (RNM) trils, but not Surm (URM) trils, since two ssocitive pths cn be tken from the gol minicolumn to the Surm minicolumn. Schultz et l. point out tht some neurons ctivted less selectively, nmely in mnner tht ws selective for the instruction cue regrdless of tril type nd expected rewrd. Similrly, our simultion shows tht neuron of the c i (Srm/Go) popultion in the Go minicolumn tht receives input from the Srm minicolumn exhibits retrievl spikes in both Srm nd Surm trils during instruction ctivity in the Srm or Surm minicolumns. Those retrievl spikes dispper once the Go minicolumn receives proprioceptive input bout key press movement in the environment nd spikes begin to occur in the encoding phse of thet modulted network. This produces 180 phse shift of firing t the time of the movement genertion. The Go minicolumn c i (Surm/Go) neuron tht receives input from the Surm minicolumn exhibits the sme trnsition of spiking from the retrievl to the encoding phse, but its retrievl spiking is more selective nd ppers only during n Surm tril, since no sequence exists tht involves the Surm minicolumn in other trils. Schultz et l. provide quntittive ssessment of the tril nd phse selective responses recorded. Of 505 neurl responses identified t recording sites, 188 exhibited tsk relted ctivity: Cerebrl Cortex Pge 11 of 19

12 Koene 99 responses showed selective ctivity t the instruction phse of trils. Of those, 63 reflected the type of reinforcer or tril (38 ctive during RM, RNM or both tril types, 22 ctive only during URM trils nd three ctive during RM nd URM trils). Fifty-one responses showed selective ctivity t the tril phse preceding rewrd (41 during both RM nd RNM trils, six during RM or RNM trils nd four during URM trils). Sixty-seven responses showed selective ctivity t the reinforcer delivery phse of trils (62 during both RM nd RNM trils, two during only RM trils nd three during URM trils). Before comprison of these numbers with the model, some cvets should be rised. The smll smple sizes in terms of the number of sites recorded by Schultz et l. nd the number of neurons simulted in the model is too smll to llow sttisticl comprison. Also, the number of selective model responses in specific ctegory depends on the rbitrry number of neurons chosen s cell ssembly within popultion of neurons in ech minicolumn. When the model is minimized so tht individul functions of the minicolumn re performed by the smllest number of neurons, then the following quntittive ssessment of responses ws obtined. In the simultion, the neurl circuitry of the model prefrontl minicolumns consisted of 328 neurons (excluding neurons tht form short-term buffers nd circuitry to process prefrontl input nd output). From those neurons, 169 tsk relted responses were recorded: 37 responses showed selective ctivity t the instruction phse of trils. Of those, 34 reflected the type of reinforcer or tril (21 ctive during RM or RNM trils, 10 ctive only during URM trils nd three ctive during RM nd URM trils). Seventy-five responses showed selective ctivity t the tril phse preceding rewrd (40 during RM or RNM or both tril types, 11 during only URM trils, 17 during RM nd URM trils nd seven unselective for tril type). Fifty-seven responses showed selective ctivity t the reinforcer delivery phse of trils (20 during both RM nd RNM trils, 14 during only RM trils, 21 during RNM trils nd two unselective for tril type). These results support correltion during the instruction phse between RM nd RNM trils seen in both dt nd model. The bsence of correltion between URM nd RNM during the tril phse preceding rewrd is lso consistent with the dt. The number of responses for both RM nd URM trils is rther higher thn the dt, s is the response ctivity for only RNM trils. Both differences my reflect difference in the model or merely sttisticl vribility. Discussion Our model replictes gol-directed behvior in visul discrimintion tsk bsed on hypothesis bout the functionl connectivity of PFC circuits (Hsselmo, 2005). Behviorl responses nd rewrd ssocitions to visul cues re encoded in synptic strengths between neuronl networks representing corticl minicolumns. The gol-directed behvior is retrieved by mens of converging spred of ctivity from representtion of desired rewrd nd the spred of ctivity from the current stte. Our results specificlly replicte the qulittive findings by Schultz et l. (2000) in terms of individul neuronl responses, while suggesting possible neurl mechnism for lerning nd retrievl. We use the model to propose explntions for the selective responses of individul neurons in orbitofrontl cortex during gol-directed behvior. The model provides frmework for the context/stimulus dependent chnge in ction selection, s proposed by Miller nd Cohen (2001). In prticulr, it provides spiking neuron implementtion of context effects similr to those of Cohen nd Servn-Schreiber (1992). We show how popultions of spiking neurons could interct to llow selection of specific ctions bsed on the context of specific sensory input (sttes) nd the desire for rewrd. Becuse ctivity in specific minicolumn (Fuster, 2000) tht represents such stte or ction my ply role in different contexts tht require its ssocition with different stte--ction-stte trnsitions, we presuppose seprte popultions of neurons within minicolumn for input from nd output to other minicolumns (Hsselmo, 2005). For exmple, the Go nd Rewrd minicolumns in the experimentl tsk fulfill such multiple roles, s shown in Figures 3 nd 12A. We show wht functionl role the individul neurons in these popultions could ply in the performnce of the tsk by replicting essentil fetures of the Schultz et l. experiment. We used similr lerning nd retrievl protocols nd replicted individul neuronl responses tht re selective for specific stte in specific tril (see Fig. 11). These selective responses my be understood in the context of neuron s function in the minicolumn model. In ddition to these explntions, the model genertes predictions for this tsk bout wht other types of responses should pper in the PFC, including neuronl responses which would look rther complex nd might therefore not normlly be clssified. One set of complex responses is shown in Figure 12B. The model predicts tht some neurons will spike throughout ll trils of gol-directed tsk, not just for specific stte, due to the spreding ctivity from gol representtion. And if encoding nd retrievl lternte continuously s modeled, then such responses tht re indictive of spreding ctivity should be recorded during stges of novel lerning s well s tsk performnce. Our results lso propose tht end-stopping implemented in the retrievl function of the model my be detected s shown in Figure 12C. Evidence tht supports possible end-stopping of spreding ctivity is provided by the termintion of recorded spikes in Schultz et l. (2000), where neuronl ctivity tht is selective for Srm or Srnm instruction stimuli nd for ction preceding rewrd termintes s soon s rewrd is received. Predictions of the model suggest experiments tht test the vlidity of two of its centrl tenets: convergence of ctivity through representtions tht my be ssocited in multiple wys (Sutton nd Brto, 1981) nd the need for short-term buffer. The structure of the model uses progressive bckwrd spred of ctivity from the gol. This suggests n experiment tht could test this feture, in which ssocitions re formed sequentilly between sttes nd ctions leding to prticulr gol. Imgine n opernt tsk, in which specific sequences of lever presses result in rewrd. For exmple, pressing levers in the sequence A--B--C should result in rewrd. If the levers re pressed rndomly, eventully the correct sequence will occur, in lerning prdigm nlogous to the one used in experiments by Terrce et l. (2003). In the model, this will initilly led to n ssocition between the finl ction press C nd rewrd (note tht this ction involves being t specific stte in front of lever C nd generting the ction press ). Upon further ccidentl production of the sequence, it will led to ssocition of press B, then press C with rewrd, nd finlly Pge 12 of 19 Prefrontl Cortex Model d nd Hsselmo

13 press A, press B, press C with rewrd. The ctivity of the g i nd g o neurons in the model would initilly show ctivity only for rewrd, then would show persistent increse when the ssocition is first formed with press C, followed by increses in seprte popultions when the ssocition is formed for press B nd finlly for press A. Thus, the overll popultion of neurons firing during the tsk would show progressive increse s the specific sequence is lerned. During encoding, our model depends on the function of STM buffers, nd dt by Andrde shows sustined currents tht my support such function in the PFC (Andrde, 1991). However, those buffers need not reside in the PFC. A plusible lterntive source of buffered perceptul spike ptterns is in the entorhinl cortex, in which neurons tht exhibit intrinsic persistent spiking hve been found (Klink nd Alonso, 1997b). In either cse, it is possible tht STM function my be disrupted without impiring decision mking for known tsks. The function of short-term buffers my be blocked by phrmcologicl gents. For exmple, the muscrinic ntgonist scopolmine will block the ADP which provides one mechnism for sustined spiking of corticl neurons (Andrde, 1991; Klink nd Alonso, 1997b; Frnsen et l., 2002). Without working short-term buffers in the PFC, the model predicts correct retrievl function for lerned tsks, but n inbility or impirment to lern new tsks. This my underlie the impirment of tsk rule shifting seen with cholinergic lesions (McGughy et l., 2004; J. McGughy et l., unpublished dt). Cholinergic blockde does cuse impirment of short-term delyed mtching function (Brtus nd Johnson, 1976; Penetr nd McDonough, 1977). Criticl Vribles of the Simultion The successful results obtined with the simultion depend on severl criticl vribles. Within the model of prefrontl minicolumn, specific set of connections must hve conductnces tht led to subthreshold excittion of postsynptic neurons nd nother set must hve conductnces tht led to suprthreshold excittion nd therefore drive spiking in postsynptic neurons. The set of subthreshold connection consists of the connections from to g i nd the connections from g o to c i. The set of suprthreshold connections consists of the connections from to g o, from g i to c o, nd from c i to g o (s shown in Fig. 4). For gol-directed prefrontl output, it is necessry tht current stte input to c o neuron popultion does not chieve spiking, except t those neurons tht lso receive gting input from neurons ctivted in the g i popultion by the spred from the gol representtion. Synpses t modifible connections W g, W ig, W c nd W ic re initilized with smll subthreshold conductnces. There is no need to djust the lerning rte during encoding, since specific mximum conductnce is chieved in strengthened connections. Tht mximum is set to provide suprthreshold excittion through the gol-spred connections W g nd W ig, nd subthreshold excittion through W c nd W ic (where the spiking of neurons in c i is gted by g o, the spiking of neurons in c o is gted by g i during retrievl). The excittion of neuron in c i by individul input from g o or through W c nd the excittion of neuron in c o by individul input from g i or through W ic is insufficient to elicit spike. When two subthreshold inputs combine t neuron in c i (one from g o nd one through W c ), or when two subthreshold inputs combine t neuron in c o (one from g i nd one through W ic ), then spike is elicited. Another criticl vrible is the modultion of specific connection strengths in the minicolumn model by thet input (Hsselmo et l., 2002). Thet modultion llows go specific to drive go diffuse through suprthreshold excittion nd ct s one popultion g o for the spred of ctivity from gol representtion during retrievl phses. During encoding phses the connection is wekened nd the two popultions re treted seprtely s shown in Figure 5. Differentil modultion of excittory input from to go diffuse (see Fig. 5) nd of input from to n interneuron popultion tht sends inhibitory input to go diffuse switches from suprthreshold excittory input from during encoding phses to providing the end-stopping function during retrievl phses. During encoding phses, thet modultion enbles input from buffered ctivity in c i (n 1) to g specific o (Fig. 5). Input from g i to c o is modulted so tht suprthreshold excittion drives c o during encoding phses, but subthreshold excittion performs the gting function during retrievl phses. Lstly, criticl vribles re involved in the timing of shortterm buffers (Lismn nd Idirt, 1995; Jensen et l., 1996; Koene et l., 2003). A working buffer requires tht the rise time of ADP mtches the period of thet cycle (Frnsen et l., 2002) nd tht recurrent inhibition seprtes consecutive spikes sufficiently to retin their order, but within time intervl tht enbles STDP between neurons tht spike in response to the buffer output. For the first-in-first-out replcement of spikes mintined in buffer, inhibitory input presented due to the combintion of new input to the buffer nd the lst spike in the buffer must cuse hyperpolriztion t the phse of first spike rectivtion (see Fig. 6C). Thet oscilltions chieve the necessry synchroniztion of rectivtion cycles in the STM buffers nd encoding nd retrievl phses in the minicolumns. Correspondence of Simultion Results nd Dt As mentioned in the results, the present study does not ttempt to ttribute mening to the quntittive ssessment of numbers of responses tht belong to ny specific ctegory of responses tht re selective for tril type nd phse of tht tril. For quntittive comprison of tht sort, n experimentl study would hve to record from lrger smple of neurons nd the simultion would hve to include rtionle for the number of cells in ssemblies tht correspond to ech functionl unit of the prefrontl model. The model effectively mtches the dt in mny wys, in ddition to successfully lerning the gol-directed behvior for the visul discrimintion tsk. Our results show tht the simultions replicte tril nd phse-of-tril selective ctivity in individul neurons. A direct comprison between the selective ctivity recorded by Schultz et l. nd tht produced in the simultion (Fig. 11) demonstrtes the correspondence between the two sets of results. Both the Schultz et l. dt nd our simultion results show individul neurons tht re selective for the presenttion of visul cue, the period preceding potentil rewrd in which decision for motor ction my be mde, or the receipt of rewrd. Tht selectivity is specific to prticulr tril type: rewrded movement, rewrded non-movement or unrewrded movement. Significntly, both the dt nd the simultion results show tht selectivity for exctly one specific tril type (RM, RNM or URM) ws typicl of responses tht showed selective ctivity during the instruction phse of tril, nd typicl for responses Cerebrl Cortex Pge 13 of 19

14 Koene tht showed selective ctivity during lter phse of tril. This correspondence supports the ide tht those minicolumns tht represent specific ctions or rewrds my be ssocited with multiple tril types. Another significnt feture of the model is the bsence of neurons tht respond in both RNM nd URM trils, which lso corresponds with the dt. Some properties of neuronl responses in the model re importnt for function, but my not be tested by the nlysis procedures of the experiment. In prticulr, the nlysis of experimentl dt did not specificlly serch for neurons which turned on continuously during tsk performnce without showing specificity, nd did not serch for neurons which terminted ctivity t specific time. The model produced bckground spiking ctivity tht ppers unselective for tril nd phse throughout the tsk in 38 neurons. For the purpose of response ctegoriztion, this bckground spiking rte ws subtrcted to identify selective spike trins in those responses. The cells with this bckground ctivity re those tht re involved in the spred of ctivity from the gol through ssocited minicolumns. Note tht mny such cells my hve been deemed not tsk relted by Schultz et l., while they clerly perform n importnt function in the model. One indiction of such bckground ctivity in the report by Schultz et l. comes in the form of neurons with tsk specific ctivity tht ppered prior to the instruction stimulus. Schultz et l. evluted ctivity in 188 out of 505 neurons. As specified in Trembly nd Schultz (2000), they did find 14 neurons tht ctivted unselectively for ll fmilir instruction types in the tsk. Yet Schultz et l. evluted neurons tht ctivted selectively for one or two phses of specific tsk trils, since responses demonstrting ctivity throughout tril my hve been discrded by the one-tiled Wilcoxon test of the evlution softwre tht they used to ssess tsk relted ctivity. The simultion results identified significnt periods of inctivity in ddition to the detection of selective ctivity. Some of the cells with bckground spiking throughout the trils of the tsk exhibit periods of inctivity tht correspond directly with their involvement in the retrievl of known ssocition tht determines gol-directed behvior in specific tril. At such simulted cell, inhibition (end-stopping) of the spred of ctivity from the gol representtion cuses the period of inctivity. Schultz et l. did not report specific evlution of the times t which the ctivity of some neurons ends, while other responses with rhythmic bckground ctivity during the sme tril continue. Schultz et l. mention neurons tht remin ctive throughout the instruction-trigger dely, but do not quntify the number of such cses. Cses reported in the dt in which neurl ctivity within tril turns off immeditely t the onset of following phse my be indictive of end-stopping. The simultion results show some differences compred to the dt obtined by Schultz et l. One tht is immeditely pprent is the precise nd reproducible nture of specific intervls of spiking nd of silence for ech neuron in the model. This is feture cused by the bsence of noise in the simulted physiologicl functions. A greter proportion of the responses recorded by Schultz et l. showed selective ctivity prior to rewrd or during rewrd in both RM nd RNM trils thn in only one of those two tril types. The proportions were reversed in the results obtined with the model, where more neurons responded to both, but these differences my not be meningful due to the smple size issue outlined bove. The model responses contined lrger proportion of cells tht respond selectively during both RM nd URM trils thn tht reported by Schultz et l. In the tril phse preceding the reinforcer, this ws ctegory not reported by Schultz et l. nd prediction of the model tht further experiments with recordings t greter number of sites my verify. Reltion to Other Physiologicl Studies This study shows how neuronl responses tht guide behvior could reflect conjunction of forwrd spred (stimulus dependent spred) nd bckwrd spred from gol (goldependent spred). The ltter reltes to responses obtined by Thorpe et l. (1983), where the chnge in rewrd contingency demonstrtes evidence for rewrd dependent response. The Schultz et l. experiments replicted here were n extension of the work by Thorpe nd Rolls, who recorded single unit ctivity of orbitofrontl neurons in primtes during Go/NoGo opernt tsk. In tht tsk, monkeys lerned to ssocite rewrd or n versive outcome with movement following specific stimulus. The mening of stimulus ws reversed during this tsk. Thorpe nd Rolls showed tht most neurons responded selectively to specific stimuli nd tht the responses were lso selective to whether the stimulus indicted rewrd in specific tril. Simultion of Thorpe et l. using our model would require chnges in rewrd contingency in the tsk, nd the use of some mechnism of long-term depression in the model to replicte decrese in response to previously rewrded stimuli. Tetrode recordings by Jung et l. (1998) showed tht the correltion of ctivity in neurons in the PFC does not mp directly to sensory informtion such s loction in sptil tsks. Rther, the ctivity correltes with behviorl requirements tht re tsk specific, s shown with other simultions of virtul rt in sptil tsks (Hsselmo, 2005). The present experimentl results lso relte to response dt obtined by Schoenbum et l. (1998), where chnges in rewrd contingency were lso shown to influence neuronl responses in rts. These responses were recorded in brin res tht communicte with orbitofrontl cortex through reciprocl connections, such s the bsolterl mygdl, which my provide feedbck of n error function to void n versive outcome. In order to encode the specific components of tsk nd to encode predictive reltionships by ssociting those components, the connections between neurons in networks of minicolumns nd connections with the res tht provide input nd receive output must be esily modifible. Experimentl evidence hs been found for rpid chnge in functionl connectivity in terms of modifictions of the strength of connections in orbitofrontl cortex nd between orbitofrontl cortex nd relted res such s the bsolterl mygdl (Schoenbum et l., 2000; Mulder et l., 2003). In those experiments, observed chnges in odor selectivity were closely mtched by chnges in correlted firing ctivity during initil lerning tht led to ccurte performnce on discrimintion problem. Reltion to Reinforcement Lerning Theory: Biologicl Implementtion of Reinforcement Lerning Rules tht govern successful behvior re discovered by lerning how specific ction tken in one circumstnce is followed by nother circumstnce. In other words, cusl effect is inferred from the results of possible ction tht is explored while in perceived stte. In mchine lerning, lgorithms for this re Pge 14 of 19 Prefrontl Cortex Model d nd Hsselmo

15 known s reinforcement lerning (Sutton nd Brto, 1998). In reinforcement lerning, gols re explicit nd formlly represented by rewrd vlue. The reinforcement lerning frmework hs lso been relted to cognitive neurl processes (Brto, 1995,b; Montgue et l., 1996; Schultz et l., 1997). Reinforcement lerning defines stte signl s ny informtion tht is vilble bout the environment t given time, which my be pre-processed sensory input nd my include some memory of preceding sttes. The stte signl hs wht is known s the Mrkov property if it contins representtion of ll the informtion bout current nd preceding sttes nd ctions tht re relevnt to future decisions (White, 1969; Ross, 1983; Bertseks, 1995). A stte signl with the Mrkov property my be evluted independent of the sttes nd ctions tht precede it. Reinforcement lerning lgorithms do not provide instruction bout correct ctions. Insted, n ction is given vlue by lerning its consequences. Yet, reinforcement lerning llows rnge of different lgorithms for lerning these vlues. A populr lgorithm for reinforcement lerning is temporl difference (TD) lerning (Sutton, 1988), which is relted to models of conditioning (Konorski, 1948; Rescorl nd Wgner, 1972). This lgorithm lerns from rw experience by updting predictive ssocitions using rewrd vlue t the time of updte. TD lerning is useful, since it requires no informtion prior to explortion bout the probbilities of trnsitions between sttes in n environment. In ddition, TD lerning methods with Hebbin mechnisms (Hncock et l., 1991; Montgue et l., 1993; Montgue nd Sejnowski, 1994; Ro nd Sejnowski, 2001) hve been proposed for the cnonicl circuit of neocortex (Dougls et l., 1989; Artol et l., 1990). One pproch to TD lerning, known s SARSA (stte--ction rewrd stte--ction), is notble for lerning the vlue of ctions in trnsitions between stte--ction pirs insted of the vlue of stte in trnsitions from stte to stte (Sutton, 1996; Sutton nd Brto, 1998, ch. 7.5). The lerning method in this pper ssumes stte-- ction pirs, s in the SARSA pproch, lthough it is not derived from SARSA or TD lerning. The present model focuses on selection of ctions on the bsis of ction vlue. It does not require the use of TD lerning to crete the ction vlue function, becuse the constrined nture of trining ensured tht it lerned effective ction vlue functions. Further modifiction will be needed to llow effective lerning with rndom genertion of ctions during explortion, using mechnism nlogous to TD lerning (Hsselmo, 2005). The model nevertheless provides neurl implementtion of the ction selection process in the reinforcement lerning frmework tht does not depend on lookup tbles. In the model, encoding of behviorl rules requires tht PFC contins unique representtions of specific sttes nd ctions. Fuster (2000) presented evidence tht ctivity in the PFC is representtive of two types of perception, one tht correltes with the sensory stte evoked by pst nd current stimuli nd one relted to proprioceptive senstion nd prediction of motor ctions. Given the representtion of sttes nd ctions, the trnsition from one stte to nother stte vi specific ction cn be encoded uniquely if there is specific neurl ctivity tht occurs only for tht ction nd only when the ction is initited in prticulr stte. This requirement leds to the presupposition tht functionl minicolumn contins popultions of input neurons nd popultions of output neurons tht form connections with other minicolumns, nd tht the neurons in those popultions re connected in structured mnner to other minicolumns, in this simultion to exctly one. The internl weight mtrices of n ction minicolumn, W ig nd W ic, ct s second-order conditionl trnsition mtrices from one stte to nother. A functionlly similr pttern of connectivity could be lerned by self-orgniztion. Since the combintion of ctivity t specific input neuron nd specific output neuron of n ction minicolumn represents the trnsition from preceding stte to following stte, tht informtion gives the model the Mrkov property (e.g. Sutton nd Brto, 1998, ch. 3.5). This property mens tht one-step dynmics enble us to predict the next stte nd expected rewrd for specific ction. Our model therefore provides mens of extending principles of reinforcement lerning to biologicl circuits nd the spiking responses of neurons. Reltion to Antomicl Dt on Minicolumns The successive neuronl lyers in cnonicl circuit of the neocortex, s described by Dougls et l. (1989), cn be represented by the individul networks t the brnch nodes of hierrchicl network (Fellemn nd Vn Essen, 1991). Ctegorizing the prts of our model in such hierrchy, the motor output (by popultions c i nd c o ) corresponds to the ctivity of the infrgrnulr lyer of the neocortex. Since sensory input is received in lyer IV, its function my correspond to tht of neurons designted. And the suprgrnulr lyer hs mny extensive nd long rnge excittory connections with other regions so tht it cn perform the function of our minicolumn model popultions g i nd g o. This function tht chieves the convergence of gol spred with current stte input depends on the lterl connectivity within the neocortex. In studies of the visul cortex, the lterl connectivity hs been ssocited (Kwto et l., 1993; Dyn nd Hinton, 1996) with necessry role in the interprettion of input nd its trnsltion into complex hierrchicl model. The genertion of visul receptive fields tht re tuned to recognize different orienttions (Somers et l., 1995; Yishi et l., 1995) ws relted to this proposed role. Lterl connectivity in the prefrontl region of neocortex includes short- nd long-rnge excittory connections, s well s short-rnge inhibitory connections (Brbs nd Pndy, 1989; Brbs, 2000). The result is ptchy lterl lyout of cells tht re highly interconnected within column of corticl lyers, the soclled neocorticl minicolumn. It hs been shown tht strong locl connectivity in minicolumn cn sustin ctivity during delyed response tsks such s long-term gol directed behvior for which subject must be ble to mintin informtion regrding stimulus (Gutkin et l., 2000; Wood nd Grfmn, 2003). Locl circuits tht my exhibit the function of the proposed minicolumns were identified in the lterl connectivity of the PFC, nd Constntinidis nd Goldmn-Rkic (2002) showed tht the ctivity of interneurons within such ensembles is strongly correlted. The correlted firing does not extend to distnt res or modules, nd the ctivity of sptilly proximte excittory cells is less correlted thn tht of interneurons. In fct, spiking of different pyrmidl cells responsible for the longrnge propgtion of ctivity is lrgely independent. Lund et l. (1993) proposed mens by which such locl circuits my rise during development. Anlogous connectivity ws described for the middle temporl visul re (Munsell nd Vn Essen, 1983), Cerebrl Cortex Pge 15 of 19

16 Koene nd model for similr locl circuit development ws proposed by Grossberg nd Willimson (2001) for visul cortex res V1 nd V2. While our model resembles interction of feedbck nd feedforwrd used in Grossberg nd Willimson (2001), the visul models focus on top-down spred mediting globl feture detection rther thn rewrd contingencies. Our model more closely resembles the proposl by Mumford (Mumford, 1992) for bottom-up nd top-down interctions. If gol-directed behvior is to emerge in the PFC, its neurontomy must support ctivity tht interprets sensory nd proprioceptive motor input, nd it must enble subsequent output tht ffects behvior. Previous surveys of the neuronl rchitecture of neocortex show tht dul pthwys between corticl res could implement the necessry pthwys for the nlysis of input nd the synthesis of output tht guides behvior (Mumford, 1991, 1992, 1994). In the frmework presented here, neuronl popultions tht correspond to cells in lyer IV of neocortex re identified s input neurons for bottom-up corticl processing. Their bility to nlyze input is represented by consequent ctivity of input neurons in specific minicolumn. The ssocitive connections between minicolumns led to synthesis of ctivity tht represents gol-directed output. While the model is intended to be pplicble to the function of prefrontl minicolumns in generl nd not specific to orbitofrontl cortex, the encoding of rewrd found in orbitofrontl cortex for the Schultz et l. tsk led to minicolumn representtion of rewrd stte. In other (e.g. sptil) tsks where multiple routes cn chieve gol, specific rewrd vlue my be encoded by differentil strengthening of ssocitions between rewrd nd specific goldirected strtegies. When tsk includes multiple gols or strtegies with different rewrd vlues, mechnism must exist to select one gol over nother nd to direct behvior ccordingly. The recruitment of distinct regions of orbitofrontl cortex hs been observed during incentive judgements nd gol selection. Lterl orbitofrontl ctivity hs been observed selectively when tsk required tht responses to lterntive desirble items must be suppressed (Arn et l., 2003). As implemented in the present model, gting by the spred of ctivity from one gol would compete with tht of nother gol t neuronl popultions where gol spred nd forwrd spred from current stte converge. Successful neuronl firing suppresses the selection of other possibilities through recurrent inhibition. Notes The CATACOMB simultions described here nd informtion bout CATACOMB re vilble on our Computtionl Neurophysiology website t Supported by NIH R01 grnts DA16454, MH60013 nd MH61492 to M.E.H. nd by Conte Center Grnt MH60450, s prt of the NSF/NIH Collbortive Reserch in Computtionl Neuroscience Progrm. Address correspondence to M.E. Hsselmo, Center for Memory nd Brin, Deprtment of Psychology nd Progrm in Neuroscience, Boston University, 64 Cummington Street, Boston, MA 02215, USA. Emil: hsselmo@bu.edu. References Alonso A, Gztelu J, Bruno W Jr, Grci-Austt E (1987) Crosscorreltion ½AQ3Š nlysis of septohippocmpl neurons during thet-rhythm. Brin Res 413: Andrde R (1991) The effect of crbchol which ffects muscrinic receptors ws investigted in prefrontl lyer v neurons. Brin Res 541: Arn F, Prkinson JEH, Hollnd A, Owen A, Roberts A (2003) Dissocible contributions of the humn mygdl nd orbitofrontl cortex to incentive motivtion nd gol selection. J Neurosci 23: Artol A, Brocher S, Singer W (1990) Different voltge-dependent thresholds for inducing long-term depression nd long-term potentition in slices of rt visul cortex. Nture 347: Blleine B, Dickinson A (1998) Gol-directed instrumentl ction: contingency nd incentive lerning nd their corticl substrtes. Neurophrmcology 37: Brbs H (2000) Connections underlying the synthesis of cognition, memory, nd emotion in primte prefrontl cortices. Brin Res Bull 52: Brbs H, Pndy D (1989) Architecture nd intrinsic connections of the prefrontl cortex in the rhesus monkey. J Comp Neurol 286(3): Brto A (1995) Adptive critics nd the bsl gngli. In: Models of informtion processing in the bsl gngli (Houk J, Dvis JL, Beiser, DG, eds), pp Cmbridge, MA: MIT Press. Brto A (1995b) Reinforcement lerning. In: Hndbook of brin theory nd neurl networks (Arbib M, ed.), pp Cmbridge, MA: MIT Press. Brtus R, Johnson H (1976) Short-term memory in rhesus monkey: disruption from the nti-cholinergic scopolmine. Phrmcol Biochem Behv 5: Bechr A, Dmsio A, Dmsio H, Anderson S (1994) Insensitivity to future consequences following dmge to humn prefrontl cortex. Cognition 50: Bechr A, Dmsio H, Trnel D, Dmsio A (1997) Deciding dvntgeously before knowing the dvntgeous strtegy. Science 275: Bertseks D (1995) Dynmic progrmming nd optiml control. Belmont, MA: Athen. Bi G, Poo M (1998) Synptic modifictions in cultured hippocmpl neurons: dependence on spike timing, synptic strength, nd postsynptic cell type. J Neurosci 18: Bliss T, Collingridge G (1993) A synptic model of memory: longterm potentition in the hippocmpus. Nture 361: ½AQ4Š Bliss T, Lømo T (1973) Long-lsting potentition of synptic trnsmission in the dentte re of the nesthetized rbbit following stimultion of the perfornt pth. J Physiol 232: Brgin A, Jndo G, Ndsdy Z, Hetke J (1995) Gmm ( Hz) oscilltion in the hippocmpus of the behving rt. J Neurosci 15: Brzhnik E, Fox S (1999) Action potentils nd reltions to thet rhythm of septohippocmpl neurons in vivo. Exp Brin Res 127: Cnnon R, Hsselmo M, Koene R (2003) From biophysics to behviour: Ctcomb2 nd the design of biologiclly plusble models for sptil nvigtion. Neuroinformtics 1: 1: Cohen J, Servn-Schreiber D (1992) Context, cortex nd dopmine: connectionist pproch to behvior nd biology in schizophreni. Psychol Rev 99: Constntinidis C, Goldmn-Rkic P (2002) Correlted dischrges mong puttive pyrmidl neurons nd interneurons in the primte prefrontl cortex. J Neurophysiol 88: Dyn P, Hinton G (1996) Vrieties of helmholtz mchine. Neurl Networks 9: Dougls R, Mrtin K, Whitteridge D (1989) A cnonicl microcircuit for neocortex. Neurl Comput 1: Elmn J (1990) Finding structure in time. Cogn Sci 14: Elmn J (1991) Distributed representtions, simple recurrent networks, nd grmmticl structure. Mchine Lern 7: Fellemn D, Vn Essen D (1991) Distributed hierrchicl processing in the primte cerebrl cortex. Cereb Cortex 1: Frnsen E, Alonso A, Hsselmo M (2002) Simultions of the role of the muscrinic ctivted clcium-sensitive nonspecific ction current incm in entorhinl neuronl ctivity during delyed mtching tsks. J Neurosci 22: Pge 16 of 19 Prefrontl Cortex Model d nd Hsselmo

17 Frey S, Petrides M (1997) Orbitofrontl cortex: key prefrontl regions for encoding informtion. Proc Ntl Acd Sci USA 15: Funhshi S, Bruce C, Goldmn-Rkic P (1989) Mnemonic coding of visul spce by neurons in the monkey s dorsolterl prefrontl cortex reveled by n oculomotor delyed response tsk. J Neurophysiol 61: Fuster J (1973) Unit ctivity in prefrontl cortex during delyedresponse performnce: neuronl correltes of trnsient memory. J Neurophysiol 36: Fuster J (2000) Prefrontl neurons in networks of executive memory. Brin Res Bull 52: Fuster J, Buer R, Jervey J (1982) Cellulr dischrge in the dorsolterl prefrontl cortex of the monkey in cognitive tsks. Ex Neurol 77: Gerstner W (2002) Integrte-nd-fire neurons nd networks. Cmbridge, MA: MIT Press. Gerstner W, Kistler W (2002) Spiking neuron models: single neurons, popultions, plsticity. Cmbridge, UK: Cmbridge University Press. Grossberg S, Willimson J (2001) A neurl model of how horizontl nd interlminr connections of visul cortex develop into dult circuits tht crry out perceptul grouping nd lerning. Cereb Cortex 11: Gutkin B, Ermentrout G, O Sullivn J (2000) Lyer 3 ptchy recurrent excittory connections my determine the sptil orgniztion of sustined ctivity in the primte prefrontl cortex. Neurocomputing : Hncock P, Smith L, Phillips W (1991) A biologiclly supported errorcorrecting lerning rule. Neurl Comput 3: Hrel D (1987) Sttechrts: visul formlism for complex systems. Sci Comput Prog 8: Hsselmo M (2005) A model of prefrontl corticl mechnisms for gol directed behvior. J Cogn Neurosci (in press). Hsselmo M, Bodelon C, Wyble B (2002) A proposed function for hippocmpl thet rhythm: seprte phses of encoding nd retrievl enhnce reversl of prior lerning. Neurl Comput 14: Izquierdo A, Murry E (2004) Combined unilterl lesions of the mygdl nd orbitl prefrontl cortex impir ffective processing in rhesus monkeys. J Neurophysiol 91: Jensen O, Lismn J (1996) Novel lists of 7±2 known items cn be relibly stored in n oscilltory short-term memory network: Interction with longterm memory. Lern Mem 3: Jensen O, Idirt M, Lismn J (1996) Physiologiclly relistic formtion of utossocitive memory in networks with thet/gmm oscilltions: role of fst NMDA chnnels. Lern Mem 3: Jung M, Qin Y, McNughton B, Brnes C (1998) Firing chrcteristics of deep lyer neurons in prefrontl cortex in rts performing sptil working memory tsks. Cereb Cortex 8: Kwto M, Hykm H, Inui T (1993) A forwrd-inverse optics model of reciprocl connections between visul corticl res. Network 4: Klink R, Alonso A (1997) Morphologicl chrcteristics of lyer ii projection neurons in the rt medil entorhinl cortex. Hippocmpus 7: Klink R, Alonso A (1997b) Muscrinic modultion of the oscilltory nd repetitive firing properties of entorhinl cortex lyer ii neurons. J Neurophysiol 77: Koene R, Gorchetchnikov A, Cnnon R, Hsselmo M (2003) Modeling goldirected sptil nvigtion in the rt bsed on physiologicl dt from the hippocmpl formtion. Neurl Networks 16: Konorski J (1948) Conditioned reflexes nd neuron orgniztion. Cmbridge: Cmbridge University Press. Levy W, Stewrt D (1983) Temporl contiguity requirements for longterm ssocitive potentition/depression in the hippocmpus. Neuroscience 8: Lismn J, Idirt M (1995) Storge of 7±2 short-term memories in oscilltory subcylces. Science 267: Lübke J, von der Mlsburg C (2004) Rpid processing nd unsupervised lerning in model of the corticl mcrocolumn. Neurl Comput 16: Lund J, Yoshiok T, Levitt J (1993) Comprison of intrinsic connectivity in different res of mcque monkey cerebrl cortex. Cereb Cortex 3: Mnns I, Alonso A, Jones B (2000) Dischrge properties of juxtcellulrly lbeled nd immunohistochemiclly identified cholinergic bsl forebrin neurons recorded in ssocition with the electroencephlogrm in nesthetized rts. J Neurosci 20: Mrkrm H, Lübke J, Frotscher M, Skmnn B (1997) Regultion of synptic efficcy by coincidence of postsynptic ps nd epsps. Science 225: Munsell J, Vn Essen D (1983) The connections of the middle temporl visul re (mt) nd their reltionship to corticl hierrchy in the mcque monkey. J Neurosci 3: McGughy J, Koene R, Eichenbum H, Hsselmo M (2004) Effects of cholinergic defferenttion of prefrontl cortex on working memory: convergence of behviorl nd modeling results. In: Proceedings of the 2004 Annul Meeting of the Society for Neuroscience, Sn Diego, CA. ½AQ5Š Miller E, Cohen J (2001) An integrtive theory of prefrontl cortex function. Annu Rev Neurosci 24: Montgue P, Sejnowski T (1994) The predictive brin: temporl coincidence nd temporl order in synptic lerning mechnisms. Lern Mem 1: Montgue P, Dyn P, Nowln S, Pouget A, Sejnowski T (1993) Using periodic reinforcement for directed self-orgniztion. In: Advnces in neurl informtion processing systems (Giles C, Hnson S, Cown J, eds), vol. 5, pp Sn Mteo, CA: Morgn Kufmnn. Montgue P, Dyn P, Sejnowski T (1996) A frmework for mesencephlic dopmine systems bsed on predictive hebbin lerning. J Neurosci 16: Mountcstle V (1997) The columnr orgniztion of the neocortex. Brin 120: Mulder A, Nordquist R, O rgüt O, Pennrtz C (2003) Lerning-relted chnges in response ptterns of prefrontl neurons during instrumentl conditioning. Behv Brin Res 146: Mumford D (1991) On the computtionl rchitecture of the neocortex. I. The role of the thlmo-corticl loop. Biol Cybernet 65: Mumford D (1992) On the computtionl rchitecture of the neocortex. II. The role of cortico-corticl loops. Biol Cybernet 66: Mumford D (1994) Neuronl rchitectures for pttern-theoretic problems. In: Lrge-scle neuronl theories of the brin (Koch C, Dvis J, eds), pp Cmbridge, MA: MIT Press. O Reilly R, Munkt Y (2000) Computtionl explortions in cognitive neuroscience: understnding the mind by simulting the brin. Cmbridge, MA: MIT Press. Pers A, Prkinson A, Hopewell L, Everitt B, Roberts A (2003) Lesions of the orbitofrontl but not medil prefrontl cortex disrupt conditioned reinforcement in primtes. J Neurosci 23: Penetr D, McDonough J Jr (1977) Effects of cholinergic drugs on delyed mtch-to-smple performnce of rhesus monkeys. Phrmcol Biochem Behv 19: Quintn J, Fuster J (1992) Mnemonic nd predictive functions of corticl neurons in memory tsk. Neuroreport 3: Ro R, Sejnowski T (2001) Spike-timing-dependent Hebbin plsticity s temporl difference lerning. Neurl Comput 13: Rescorl R, Wgner A (1972) A theory of pvlovin conditioning: the effectiveness of reinforcement nd non-reinforcement. In: Clssicl conditioning. II. Current reserch nd theory (Blck A, Proksy W, eds), pp New York: Appleton-Century-Crofts. Rolls E (1999) The functions of the orbitofrontl cortex. Neurocse ½AQ6Š 5: Ross S (1983) Introduction to stochstic dynmic progrmming. New York: Acdemic Press. Schoenbum G, Eichenbum H (1995) informtion coding in the ½AQ7Š rodent prefrontl cortex. i. Single-neuron ctivity in orbitofrontl cortex compred with tht in pyriform cortex. J Neurophysiol 74: Schoenbum G, Eichenbum H (1995b) Informtion coding in the rodent prefrontl cortex. ii. Ensemble ctivity in orbitofrontl cortex. J Neurophysiol 74: Cerebrl Cortex Pge 17 of 19

18 Koene Schoenbum G, Chib A, Gllgher M (1998) Orbitofrontl cortex nd bsolterl mygdl encode expected outcomes during lerning. Nt Neurosci 1: Schoenbum G, Chib A, Gllgher M (2000) Rpid chnges in functionl connectivity in orbitofrontl cortex nd bsolterl mygdl during lerning nd reversl. J Neurosci 20: Schoenbum G. Setlow B, Rmus S (2003) A systems pproch to orbitofrontl cortex function: recordings in rt orbitofrontl cortex revel interctions with different lerning systems. Behv Brin Res 146: Schultz W (1998) Predictive rewrd signl of dopmine neurons. J Neurophysiol 80: Schultz W, Dickinson A (2000) Neuronl coding of prediction errors. Annu Rev Neurosci 23: Schultz W, Dyn P, Montgue P (1997) A neurl substrte of prediction nd rewrd. Science 275: Schultz W, Trembly L, Hollermn J (2000) Rewrd processing in primte orbitofrontl cortex nd bsl gngli. Cereb Cortex 10: Somers D, Nelson S, Sur M (1995) An emergent model of orienttion selectivity in ct visul corticl simple cells. J Neurosci 15: Stein R (1967) Some models of neuronl vribility. Biophys J 7: Stewrt M, Fox S (1990) Do septl neurons pce the hippocmpl thet rhythm? Neuron 13: Sutton R (1988) Lerning to predict by the methods of temporl difference. Mchine Lern 3: Sutton R (1996) Generliztion in reinforcement lerning: successful exmples using sprse corse coding. In: Advnces in neurl informtion processing systems 8. Cmbridge, MA: MIT Press. ½AQ8Š Sutton R, Brto A (1981) Towrd modern theory of dptive networks: expecttion nd prediction. Psychol Rev 88: Sutton R, Brto A (1998) Reinforcement lerning: n introduction. Cmbridge, MA: MIT Press. Terrce H, Son L, Brnnon E (2003) Seril expertise of rhesus mcques. Psychol Sci 14: Thorpe S, Rolls E, Mddison S (1983) The orbitofrontl cortex: neuronl ctivity in the behving monkey. Exp Brin Res 49: Trembly L, Schultz W (1999) Reltive rewrd preference in primte orbitofrontl cortex. Nture 398: Trembly L, Schultz W (2000) Rewrd relted neuronl ctivity during go--nogo tsk performnce in primte orbitofrontl cortex. J Neurophysiol 83: Wllis J, Anderson K, Miller E (2001) Single neurons in prefrontl cortex encode bstrct rules. Nture 411: Wllis J, Miller E (2003) Neuronl ctivity in primte dorsolterl nd ½AQ9Š orbitl prefrontl cortex during performnce of rewrd preference tsk. Eur J Neurosci 18: White D (1969) Dynmic progrmming. Sn Frncisco, CA: Holden-Dy. Wood J, Grfmn J (2003) Humn prefrontl cortex: processing nd representtionl perspectives. Nt Rev Neurosci 4: Yishi B, Bror R, Sompolinsky H (1995) Theory of orienttion tuning in visul cortex. Proc Ntl Acd Sci USA 92: Pge 18 of 19 Prefrontl Cortex Model d nd Hsselmo